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FITTING  AN  ARBITRARY  FUNCTIONAL  RELATIONSHIP  BY  LEAST 
SQUARES  WITH  ALL  OF  THE  VARIABLES  SUBJECT  TO  ERROR 

by 


Kenneth  A.  Norton 
National  Bureau  of  Standards 
Boulder,  Colorado 


1.  INTRODUCTION 

Although  too  often  forgotten  by  those  attempting  to  apply 
the  method  of  least  sq^res,  it  has  been  emphasized  by  most 
authors  — * — ^/throughout  the  development  of  the  theory 


1/  C.  F,  Gauss,  "Theoria  Motus  Corporum  Coelestium,  " 
(Hamburg,  1809),  Art.  179. 

i/c.  H.  Kummell,  "Reduction  of  Observation  Equations 
which  Contain  More  than  One  Observed  Quantity,  " The  Analyst 
(Des  Moines),  vol.  6,  no.  4,  July,  1879,  pp.  97-105. 

1/  H.  S.  Uhler,  "Method  of  Least  Squares  eind  Curve 
Fitting,  " Jour.  Optical  Society  of  America,  vol.  7,  pt.  2, 

Nov.,  1923,  pp.  1043-1066, 

4/  D.  V.  Lindley,  "Regression  Lines  and  the  Linear 
Functional  Relationship,  " Supplement  to  the  Journal  of  Royal 
Stat.  Soce,  vol.  9,  no.  2,  1947,  pp.  218-244. 

.§/  Abraham  Wald,  "The  Fitting  of  Straight  Lines  if  Both 
Variables  are  Subject  to  Error,  " Annals,  of  Math,  Stat,,  vol.  XI, 
Sept.,  1940,  pp.  284-300. 

^ M,  S.  Bartlett,  "Fitting  a Straight  Line  when  Both 
Variables  are  Subject  to  Error,  " Biometrics,  vol.  V,  no.  3, 

1949,  p.  207-212. 

2/  W.  E.  Deming,  "Statistical  Adjustment  of  Data,  " 

John  Wiley  and  Sons,  Inc.,  New  York,  1943. 
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that  it  is  necessary  to  assign,  or  in  some  way  determine,  appropriate 
relative  weights  for  each  of  the  coordinates  determining  the  location 
of  each  of  the  points,  before  it  is  possible  to  determine  consistent 
estimates  of  the  parameters  determining  the  line,  curve,  or  surface 
which  is  to  be  fitted  to  these  points.  The  precision  of  consistent 
estimates  increases  with  the  number  of  observations,  and  would  give 
the  population  mean  values  if  the  number  of  observations  were  infinite. 
An  additional  requirement  for  obtaining  consistent  estimates  of  the 
parameters  by  the  method  of  least  squares  is  that  the  expected  values 
of  the  errors  of  the  observations  be  equal  to  zero,  i.  e. , the  average 
of  an  infinite  number  of  these  observational  errors  must  be  taken  to 
be  equal  to  zero.  A solution  is  presented  in  this  paper,  with  all  of 
the  variables  subject  to  error,  for  this  general  least  squares  problem 
of  determining  consistent  estimates  of  the  parameters  of  a functional 
relationship  which  is  expected  to  fit  exactly  the  average  of  an  infinite 
number  of  observations,  A general  method  for  weighting  the  data  is 
presented,  and  examples  are  given  to  illustrate  the  effects  of  varying 
the  statistical  characteristics  of  the  observed  data.  These  examples 
demonstrate  the  necessity,  if  consistent  estimates  of  the  parameters 
are  to  be  obtained,  of  modifying  the  original  design  of  the  experiment 
in  many  cases  in  order  to  obtain  data  in  the  particular  form  required 
by  the  statistical  model  described  in  this  paper.  It  cannot  be  em- 
phasized too  strongly  that  the  use  of  statistical  models  not  representa- 
tive of  the  experimental  data  cannot  yield  consistent  estimates  of  the 
parameters  of  the  functional  relationship  actually  describing  these 
data.  Thus,  only  to  the  extent  that  the  experimenter  can  demonstrate 
the  validity  of  the  statistical  model  used,  can  he  expect  to  obtain  by 
its  use  consistent  estimates  of  the  parameters. 

The  sophisticated  scientist,  recognizing  at  the  outset  that  all 
of  his  conclusions  based  on  measurements  are  only  relatively  correct, 
turns  to  mathematical  statistics  as  a means  of  assigning  quantitative 
probabilities  to  the  degree  of  his  belief.  Throughout  this  paper  an 
attempt  is  made  to  determine  confidence  bands  corresponding  to 
specified  probabilities  for  the  various  statistics  determined  by  least 
squares.  In  most  cases  only  approximate  confidence  bands  are  now 
known  for  many  of  the  important  least  squares  statistics,  and  the 
development  of  more  precise  solutions  would  involve  a very  considerable 
complication  of  the  analysis.  Fortunately,  great  accuracy  in  the 
absolute  magnitudes  of  the  confidence  bands  is  usually  of  secondary  im- 
portance since  probabilities  are  usually  assigned  to  the  proposed 
significance  levels  in  a somewhat  subjective  and  arbitrary  manner. 

It  is  important,  however,  that  the  variation  in  the  magnitudes  of  these 
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confidence  bands  with  relevant  statistical  parameters,  such  as  the 
numbers  of  observations  and  the  numbers  of  parameters  fitted,  be 
relatively  correct,  and  every  effort  has  been  made  to  ensure  this 
result.  It  is  important  to  note  in  this  connection  that  the  determina- 
tion of  only  approximate  confidence  bands  for  the  statistics  of  a 
model  actually  representative  of  the  experimental  data  is  much  more 
desirable  than  the  determination  of  exact  confidence  bands  for 
statistics  determined  from  a simpler  statistical  model  which  is  not 
representative  of  the  data. 


In  this  paper  each  of  the  n points  to  be  fitted  in  a k dimen- 
sional space  is  considered  to  be  represented  by  the  means,  variances 
and  covariances  of  a series  of  observations  on  each  of  its  k variables. 
Thus  the  n points  in  our  two  dimensional  examples  (k  = 2)  are  con- 
sidered to  be  samples  from  n separate  and  independent  bivariate 
distributions.  Estimated  values  of  the  parameters  (two  means,  two 
variances  and  one  covariance)  defining  each  of  these  distributions, 
and  thus  each  of  the  points,  are  considered  to  be  known  on  the  basis 
of  these  measurements.  Aside  from  being  more  general  than  the  usual 
textbook  expositions,  in  which  the  expected  values  of  these  variances 
and  covariances  are  assumed  to  be  the  same  for  all  n points,  it  is 
believed  that  the  above  point  of  view  is  often  more  realistic  since  this 
is  the  way  experimental  data  frequently  present  themselves  to  the 
analyst.  For  example,  the  individual  coordinates  of  each  of  the  points 
to  be  fitted  are  often  the  means  of  samples  from  populations  with 
different  variances  when  obtained  by  the  same  experimenter  in  the 
same  laboratory,  and  will  even  more  likely  be  so  when  these  points 
have  been  obtained  as  the  result  of  observations  by  different  experi- 
menters in  different  laboratories. 


Many  analysts  have  been  disturbed. 


4,  8/ 


when  using  linear 


regression  theory  to  fit  a straight  line  to  a set  of  n points,  X^,  Y^, 

to  find  that  a different  line  is  determined  when  all  of  the  variance  is 

assigned  to  the  values  of  Y.  than  when  all  of  the  variance  is  assigned 

to  the  values  of  X..  In  those  cases  where  no  independent  information 

is  available  as  to  the  relative  weights  to  assign  to  the  Y.  and  X. 

5/  ^11 
observations,  Wald—  has  developed  a method  for  determining  not 

only  consistent  estimates  of  the  parameters  defining  the  line,  but  also 


— Joseph  Berkson,  "Are  There  Two  Regressions,  " Jour, 
Amer.  Stat.  Assoc.,  vol.  45,  no.  250,  June,  1950,  pp.  164-180. 
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confidence  intervals  for  these  parameters  and  estimates  of  the  variances 

of  the  Y.  and  X.  observations,  but  only  on  the  hypotheses  (1)  that  the 
1 1 

expected  values  of  the  variances  of  X.  and  are  the  same  for  all  n 

points,  (2)  that  the  unknown  errors  are  sufficiently  small  so  that  the 

classification  of  the  X.  (or  Y.)  into  two  groups  in  accordance  with 

11 

their  magnitudes  will  be  the  same  as  the  corresponding  classification 
of  the  unknown  population  mean  values  X.  (or  Y.  ),  (3)  that  X.  and 

^ lO  lO  1 

Y.  are  normally  distributed,  and  (4)  that  their  covariances  are  zero. 

Only  the  first  two  of  the  above  conditions  are  required  for  obtaining 
consistent  estimates  of  the  parameters,  while  the  other  two  con- 
ditions are  required  in  order  to  permit  the  assignment  of  probabilities 
to  the  confidence  intervals,  Bartlett  has  shown  how  to  improve  the 
efficiency  of  Wald's  method,  and  Bennett  and  Franklin—^  have  described 
a method  of  analysis  for  fitting  a straight  line  involving  a statistical 
model  which  is  in  many  respects  the  same  as  the  model  considered 
in  this  paper.  The  present  paper  deals  initially  with  the  case  in  which 
independent  estimates  are  available  to  the  analyst  of  the  variances  atnd 

covariances  of  each  of  the  X.  and  Y.  observations.  We  will  see  that, 

11 

when  all  of  this  information  is  independently  available,  either  by  meas- 
urement or  assumption,  tests  can  be  made  either  (1)  of  the  validity  of 
the  assumed  functional  relationship  or  (2)  for  the  presence  of  random 
"systematic”  errors.  The  designation  "systematic"  error  is  used 
consistently  throughout  this  paper  to  refer  to  that  component  of  error 
of  a particular  observation  point  which  is  not  reduced  by  making  repeated 
observations  of  its  coordinates.  If  the  tests  indicate  that  the  assumed 
functional  relationship  is  compatible  with  the  observed  data,  consistent 
estimates  of  its  parameters  can  be  obtained.  These  estimated  param- 
eters are,  of  course,  derived  from  samples  of  observed  populations 

of  the  two  random  variables  X.  and  Y.,  and  are  consistent  estimates 

1 1 

of  the  "true"  values  of  the  parameters  only  to  the  extent  that  none  of 
these  observed  populations  have  any  constant  bias  in  their  means 
relative  to  the  "true"  values  of  these  random  variables.  It  is  shown 
that  a constant  bias  of  this  kind  cannot  be  detected  by  least  squares. 

If  the  tests  indicate  the  presence  of  random  "systematic" 
errors,  methods  are  given  for  including  their  effects  in  the  fitting. 

It  is  shown,  however,  that  the  proper  inclusion  of  these  effects  of 
such  random  "systematic"  errors  usually  cannot  be  determined  from  a 


9 / 

— Carl  A.  Bennett  and  Normal  L.  Franklin,  "Statistical 
Analysis,  " John  Wiley  and  Sons,  1954,  p.  463.  These  methods  appear 
to  be  largely  drawn  from  J.  W.  Tulcey,  "Com.ponents  in  Regression,  ” 
Biometrics,  7,  1951,  pp.  33-69. 
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statistical  analysis  of  the  experimental  data  alone  and  must  be  based 
on  ad  hoc  assumptions  supplied  by  the  experimenter.  In  particular, 
when  fitting  a straight  line,  the  variance  arising  from  these  random 
systematic”  errors  cannot,  by  statistical  analysis  alone,  be  sub- 
divided into  its  three  components:  (1)  a possible  systematic  error 
variance  of  X.,  (2)  a possible  systematic  error  variance  of  Y^,  and 
(3)  the  covariance  of  these  two  possible  sources  of  random  systematic 
error. 


g / 

Berkson—  has  proposed  a most  desirable  experimental  pro- 
cedure for  taking  data  intended  to  be  fit  to  a linear  functional  relation- 
ship with  both  variables  subject  to  error.  He  calls  this  procedure  a 
controlled  experiment  and  shows  that  great  simplification  in  the 
statistical  analysis  results  from  its  use.  In  fact  it  can  be  shown  that 
the  analysis  of  measurements  made  by  this  procedure  with  both 
variables  subject  to  error  can  be  reduced  to  a problem  involving  only 
one  variable  subject  to  error.  This  procedure  and  method  of  analysis 
are  described,  and  a brief  indication  given  of  how  its  great  advantages 
may  be  extended  to  the  general  problem  of  fitting  an  arbitrary  func- 
tional relationship  with  all  of  the  variables  subject  to  error. 


It  is  shown  for  all  of  the  statistical  models  discussed  in  this 
paper  how  to  "adjust”  the  experimental  observations  to  those  particular 
values  which  jointly  have  a minimum  weighted  mean  square  deviation 
relative  to  all  of  the  experimental  data  and  of  the  assumed  functional 
relationship.  In  the  absence  of  systematic  errors,  these  "adjusted” 
values  will  also  be  consistent  estimates  of  the  population  mean  values 
of  the  n experimental  points.  Thus,  by  directing  the  attention  of  the 
experimenter  to  these  estimated  errors  (random  plus  "systematic”) 
thus  determined  for  each  coordinate  of  each  of  his  observed  points, 
it  often  makes  possible  a better  understanding  of  the  nature  and  source 
of  these  errors. 


Methods  are  given  for  calculating  (1)  confidence  bands  for  the 
least  squares  estimates  of  each  of  the  parameters  of  the  functional 
relationship  considered  independently,  (2)  elliptical  confidence  regions 
for  two  or  more  of  these  parameter  estimates  considered  jointly, 

(3)  confidence  regions  for  the  fitted  functional  relationship,  and  (4) 
confidence  regions  for  future  values  predicted  by  the  use  of  the  fitted 
functional  relationship. 

The  primary  purpose  of  this  paper  is  tutorial,  although  some 
results  are  presented  which  are  believed  to  be  new,  and  errors  in 
previous  solutions  are  corrected.  By  a slight  m_odification  in  approach. 
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some  of  the  complexity  of  the  usual  expositions  has  been  avoided, 
while  at  the  same  time  other  aspects  of  the  problem  have  been 
generalized. 

Many  of  our  results  are  contained  in  the  general  solution 
of  this  problem  obtained  by  Kummell  in  1879—^  but,  possibly  because 
he  presented  no  illustrative  examples,  the  full  significance  and  im- 
portance of  Kummell' s results  appear  to  have  been  overlooked  by 
subsequent  authors. 

An  effort  is  made  in  the  following  presentation  to  point  out 
the  wide  variety  of  solutions  to  be  expected  on  the  basis  (a)  of 
different  experimental  information  or  of  different  initial  assumptions 
relative  to  the  weights  to  be  assigned  to  the  individual  observed  points 
or  (b)  of  different  assumptions  relative  to  the  form  of  the  functional 
relationship  fitted  to  the  data.  Thus  it  cannot  be  emphasized  too 
strongly  that  the  method  of  least  squares  should  not  be  used  blindly 
by  the  experimenter  in  the  hope  that  somehow  this  will  provide  in 
some  magic  way  a best  solution.  Instead,  careful  consideration 
should  be  given  first  to  the  form  of  the  function  to  be  fitted  to  the 
data,  and  then  to  the  nature  and  reliability  of  each  experimental 
point.  A solution  by  least  squares  does  make  possible  the  efficient 
use  of  all  of  the  information  available  to  the  experimenter,  but  it 
can  never  give  results  which  are  any  better  than  the  assumptions 
and  experimental  data  used. 

Some  experimenters  use  least  squares  only  in  those  cases 
where  it  is  quite  clear  from  a plot  of  their  data  that  the  assumption 
under  consideration  (linearity  of  the  assumed  relation,  for  example) 
is  well  established  and  would  not  use  the  method  when  the  points  are 
widely  scattered.  It  seems  to  the  author  that  this  latter  attitude 
reveals  a lack  of  appreciation  of  the  scope  of  the  method,  since  it 
is  frequently  in  just  those  cases  where  the  data  are  widely  scattered 
that  the  method  of  least  squares  is  most  useful  by  providing  a sig- 
nificant quantitative  evaluation  of  the  reality  of  the  assumed  relations. 
It  is  important  to  remember  that  the  results  of  many,  even  properly 
conducted,  physical  experiments  yield  widely  scattered  data  because 
of  the  impossibility  of  controlling  the  influence  of  many  of  the  varia- 
bles, and  this  results  in  the  introduction  of  large  errors.  It  is  quite 
clear  that  these  uncontrollable  experiments  should  receive  as  much, 
if  not  more,  careful  attention  from  the  analyst  as  those  experiments 
for  which  carefully  controlled  conditions  are  feasible.  In  this  paper 
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a more  precise  estimate  of  the  standard  errors  is  given  which  allows 
for  the  second  order  effects  which  arise  with  widely  scattered  data. 

Even  granting  the  possibility  of  obtaining  the  above  informa- 
tion by  the  application  of  the  method  of  least  squares  to  the  analysis 
of  a set  of  experimental  data,  the  question  as  to  when  the  complexities 
of  analysis  by  this  method  are  justified  is  a difficult  one,  and  pre- 
sumably must  finally  be  conceded  to  be  a matter  of  judgment  with  the 
individual  analyst.  In  any  case,  before  deciding  this  question,  the 
experimenter  should  know  what  kinds  of  additional  information  are 
likely  to  become  available  from  the  application  of  the  method,  and  it 
is  the  purpose  of  this  paper  to  supply  that  information  in  detail  for  the 
particular  case  of  fitting  a linear  functional  relationship  to  a set  of 
experimentally  determined  points.  Although  most  of  the  detailed 
examples  presented  are  confined  to  an  assumed  linear  relationship, 
the  theory  presented  is  developed  for  the  general  case  of  an  arbitrary 
functional  relationship,  and  the  norrxial  equations  given  provide  a 
solution  for  this  general  case  with  all  of  the  variables  subject  to 
error. 


In  the  case  of  fitting  data  to  a straight  line,  it  is  shown  that 
the  entire  effect  of  introducing  errors  in  the  independent  as  well  as 
in  the  dependent  variable  enter  into  the  solution  for  the  parameters 
by  way  of  second  order  terms  in  the  residuals  and  terms  involving  the 
differentiation  of  the  weights.  The  resulting  effects  on  the  estimated 
parcLmeters  is  often  quite  large.  The  necessity  for  retaining  these 
additional  terms  makes  our  solution  inherently  more  complex  than 
the  usual  solutions  in  which  only  the  dependent  variable  is  assumed 
to  be  subject  to  error.  Unfortunately  the  determination  of  the  sampling 
distribution  for  the  estimated  parameters  is  also  inherently  more 
complicated  in  the  general  case,  and  precise  analytical  expressions 
for  these  sampling  distributions  are  not  yet  available.  Nevertheless 
it  seems  quite  clear  that  it  is  more  desirable,  when  both  variables 
are  subject  to  error,  to  obtain  consistent  estimates  of  the  parameters 
and  only  rough  estimates  of  their  errors  by  the  methods  described  in 
this  paper  than  to  follow  the  current  practice  of  arbitrarily  suppressing 
the  influence  of  the  errors  of  the  independent  variables,  thus  obtaining 
incorrect  estimates  of  the  parameters  and  confidence  regions  for  these 
incorrect  estimates  which  are  precise  only  on  the  false  assumption 
that  the  errors  of  the  independent  variable  may  be  suppressed. 
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The  ultimate  aim  of  most  analyses  by  least  squares  is  the 
prediction  of  one  variable  from  the  observed  values  of  another,  and 
it  is  shown  in  this  paper  that  the  estimated  functional  relationship 
should  be  used  for  this  purpose.  An  example  of  the  problem  of  pre- 
diction is  the  following.  Suppose  n pairs  of  mean  values  X.,  Y. 

(i  = 1 to  n)  are  determined  by  averaging  m.  observations  on  n different 
populations  which  have  n different  unknown  population  mean  values 

X,  , Y,  . It  is  supposed  that  X.  and  Y.  are  normally  distributed 
io  io  1 1 2 / 2 / 

about  X.  and  Y.  with  unknown  variances  c ./m.  and  cr  ,/m,,  respec- 

lO  lO  €1  1 T]1  i 

tivelv.  and  that  Y.  = a + 3 X.  where  a and  3 are  unknown  constants. 

^ lO  lO 

A new  mean  value  X is  determined  by  averaging  m.  observations 

of  X from  the  population  with  unknown  population  mean  values 

X,  , Y.  which  are  also  assumed  to  be  related  by  Y.  = a + 3 X,  • 
jo  JO  JO  jo 

It  is  desired  to  estimate  the  population  mean  value  Y.  by  means  of 
the  observed  mean  value  X.,  and  it  is  shown  that  a consistent  estimate 

of  Y.  is  a + b X.  where  a and  b are  the  least  squares  estimates  of 

JO  J . . 

a and  p,  approaching  these  values  in  the  limit  as  all  of  the  m,(i  = 1 to  n) 

approach  infinity. 


The  above  is  believed  to  be  a more  realistic  statement  of 
the  problem  of  prediction  than  the  more  usual  statement  that  we 

wish  to  know  the  expected  value  of  Y for  a given  observed  value  X. 

without  specifying  that  X.  is  from  the  j^^  population  with  a fixed,  even 
if  unknown,  population  m^ean  value  X.  . For  example,  if  we  have 

estimated  the  density,  p,  of  a certain  kind  of  steel  by  measuring 
the  n weights,  Y.,  and  the  n volumes,  X.,  of  n steel  balls  made  with 
this  steel,  we  may  estimate  the  weight,  Y.,  of  a j^^  steel  ball  by 
bX.  where  b is  the  least  squares  estimate  of  p obtained  on  the  assump- 
tions (1)  that  Y.  and  X.  are  each  measured  with  error  and  (2)  that  the 
1 1 

true  density  p is  the  same  for  all  n + 1 balls,  i.  e. , the  assumption  that 

a linear  functional  relationship  exists.  Note  that  the  j^^  steel  ball  has 

a fixed  true  weight  Y.  and  a fixed  true  volume  X.  . Now  the  expected 

JO 

value  of  the  observed  volume  X.  for  the  steel  ball  is  obviously  its 

true  volume  X.^and  it  follows  that  a consistent  estimate  of  the  weight 

Y.^of  the  j^^  sieel  ball  (given  the  steel  ball  and  its  measured  volume 

X^)  is  bX..  Now  consider  Lindley' s^/ formulation  and  solution  of  the 
J J 

problem  of  prediction.  Lindley  assumes,  in  effect,  that  an  infinite  popu- 
lation of  steel  balls  exists  and  that  the  true  volumes  X are  normally 

lO 
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distributed  about  a mean  value  X with  variance  cr  ; he  further 

o o 

assumes  that  the  true  density  p is  the  same  -for  all  of  the  steel  balls 

and  that  the  variances  of  the  errors  of  measurement  have  the  same 

2 2 

magnitudes  cr^for  weight  and  cr^for  volume  independent  of  the  size 

of  the  balls  being  measured.  He  then  purports  to  show  that  an  unbiased 
estimate  of  the  weight  of  the  ball  is  b'  X.  where  b’  is  the  regression 

coefficient  of  the  n random  observed  values  of  Y.  on  X..  However,  he 

1 1 

also  shows  that  b’  approaches  the  constant  value 

2 2,2  2 2 

y =(X  + cr  )p/(X  + cr  + cr^  as  n approaches  infinity  or,  more  generally, 

o o o o 

that  the  average  value  of  b’  equals  y where  the  individual  values  of  b^ 

• are  determined  from  many  random  finite  samples  with  n balls  in  each 
sample.  Let  us  assume  that  y is  determined  exactly  by  determining 
the  regression  of  Y,  on  X,  by  measuring  an  infinite  number  of  steel 

balls  chosen  at  random  from  the  population.  Now  select  one  more 
steel  ball  and  measure  its  volume  X..;  Lindley  states  that  an  unbiased 

th  J 

estimate  of  the  weight  of  this  j steel  ball  is  Y!  = y X..  But  we  may 

/•  1 • "til  i I ^ 

measure  the  volume  of  this  j steel  ball  an  infinite  number  of  times, 

and  in  this  way  we  might  hope  to  determine  the  true  weight  Y.  ; 

however,  the  mean  value  Y'.  = y X.  - y X.  < Y.  = 3 X.  and  we  conclude 

1 , . „ J J JO  JO  JO 

that  Lindley' s prediction  leads  to  a biased  estimate  of  the  true  value 

of  the  weight  of  the  steel  ball.  If  we  had  instead  used  the  infinite 

sample  of  Y.,  X.  to  estimate  b by  least  squares  with  both  variables 
11  ■ 

subject  to  error,  the  estimate  so  obtained  would  be  equal  to  p,  and 
now  the  mean  value  Y.  = px.  = pX.  =Y.  ,i.  e.,  the  use  of  the 

r ^ ^ JO  JO 

average  of  a large  number  of  measuremerts  of  the  volume  of  the  j 
steel  ball  would  lead  in  this  way  to  the  true  weight  of  this  ball.  Lindley' s 
expected  values  are  obtained  by  averaging  over  the  entire  population 
of  steel  balls;  thus  he  has  shown  that  a second  independent  set  of 
measurements  of  n balls  chosen  at  random  from  the  same  population 
would  lead,  as  n approaches  infinity,  to  the  same  biased  estimate,  y, 
of  the  density,  provided  this  second  set  of  measurements  of  volume 
were  made  with  instruments  having  the  same  precision  so  that 

the  same.  Only  in  this  most  unsatisfactory  sense  can  Lindley  claim 


>!< 

Since  a volume  cannot  be  negative,  this  assumption  cannot 
be  strictly  true,  but  the  distribution  of  X.  can  approximate  a normal 

distribution  if  cr  < < X . 

o o 
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that  pre^dictions  based  on  simple  regression  theory  are  unbiased; 
such  predictions  are  not  even  consistent  since  they  disagree  with 
the  average  of  a large  number  of  measurements  made  on  the 
ball. 

Predictions  made  using  the  fitted  functional  relationship  will, 
in  most  cases,  be  biased;  but  they  are  always  at  least  consistent, 
and  they  have  the  advantage  that  no  special  assumptions  need  be  made 
as  to  the  distribution  of  the  true  values  X.  . Furthermore,  the  bias 

lO 

of  predictions  made  using  the  fitted  functional  relationship  will 
always*  be  less  than  the  bias  of  predictions  made  using  simple  regres- 
sion analysis  with  the  errors  in  the  independent  variable  neglected. 


The  above  discussion  of  prediction  relates  to  the  case  in 
which  a functional  relationship  is  assumed  to  exist.  However,  it  also 
follows  from  the  above  argument  that  predictions  of  a dependent 
variable  Y made  by  the  use  of  regression  analysis  of  data  involving 
one  or  more  independent  variables  which  are  measured  with  error 
will  also  be  biased.  For  example,  if  (t^  denotes  the  variance  of  the 

true  values  of  the  independent  variable  and  the  variance  of  the 

.X, 

errors  of  measurement  of  the  independent  variable,  then  an  unbiased 

prediction  of  Y is  given  by  (Y  - Y)  = b’(X  - X)(cr^  + cr^)/cr^  where  b* 

is  the  biased  slope  of  the  regression  of  Y on  X with  the  errors  in  X 

2 2 2 

ignored.  In  practice  the  bias  correction  (cr  + o'  ) /o-  might  be 

2 2 2 2 ^ 
estimated  by  S /(S  - s ) where  S„.  is  the  sample  variance  of  the 

Xi  !X  2 Xi  2 

X.  about  their  mean  value  X and  s„  is  an  estimate  of  the  variance 
1 X X 

of  the  errors  of  measurement  of  the  X.. 

1 


❖ 

The  mathematical  proof  is  not  yet  available,  but  this 
conclusion  seems  reasonable  since  the  functional  relationship  analysis 
is  designed  to  provide  a first  order  correction  for  this  particular 
bias.  In  a certain  sense  predictions  made  using  the  fitted  functional 
relationship  can  actually  be  shown  to  be  unbiased  — see  page  2.18, 


2.  FITTING  A STRAIGHT  LINE 


Let  us  begin  with  a precise  formulation  of  the  problem.  We 

consider  2n  sets  of  random  variables,  X.  . Y.  . t = 1 to  m.,  i = 1 to  n 

it  it  1 

with  m.  > 1 and  n > 1.  A random  variable  is  a real  variable  with  an 
1 

associated  probability  distribution.  The  expected  values  of  these 

random  variables  are  E(X_)  = X.  and  E(Y.,)  = Y.  . The  expected 

it  lO  it  lO 

value  is  the  average  of  the  values  in  a large  sample  as  the  sample  size 
goes  to  infinity;  or,  more  precisely,  the  expected  value  of  a random 
variable  is  its  first  moment,  i.  e. , its  average  value  weighted  in 
accordance  with  its  probability  distribution.  X.  and  Y.  are  also 
called  the  population  mean  values  of  these  random  variables  while 

the  random  variables  (X.^  - X.  ) = €.  and  (Y.^  - Y.  ) = are  called 

it  lO  it  it  lO  it 

errors,  although  a large  component  of  such  deviations  from  the 

population  mean  values  may  arise  in  some  applications  from  the  nat- 
ural variations  of  the  phenomena  under  investigation.  Regardless  of 

the  cause  of  the  deviations,  we  take  the  expected  values  E(€  ) = 0 

it 

and  E(n.  ) = 0.  The  random  variables  and  within  the  i^^  group 
'it  it  it  ° 

are  assumed  to  have  the  same  bivariate  probability  distribution,  to 

7 2 2 2 

be  independent  and  thus  uncorrelated,  i.  e. , E(€7  ) = o'  .»  E('n7  ) = tr  ., 

it  €1  it  T)1 

and  E(€_  n.  ) = p.  cr  . (r  . while  E(c_  €.  ) = 0,  E(ti_  n.  ) = 0,  and 

it  'it  ^1  CL  T]1  it  lu  'it  lU 

) = 0 foi*  t ^ u.  The  variances  of  and  of  n..  are  all  finite, 

'it  lu  it  'it 

but  may  differ  from  each  other  and  from  one  group  to  the  next.  The 

observations  in  the  different  groups  (i  = 1 to  n)  are  independent  and 
thus  uncorrelated  E(e  g.  ) = 0 and  Efn.  -n.  ) = 0 for  i j.  It  is  antic- 
ipated  that  the  general  approach  here  developed  may  be  extended  to 
the  case  of  fitting  auto  cor  related  data,  but  this  is  beyond  the  scope  of 
the  present  paper.  Finally  it  is  assumed  that  the  population  mean 
values  X.  and  Y.  are  exactly  related  by  the  linear  relation 

Y^^  = a + p > Cl  and  p denote  the  "true”  values  of  the  param- 

eters of  this  linear  relation.  By  virtue  of  these  assumptions,  the 
model  described  in  this  section  excludes  "systematic"  errors  by  defi- 
nition. More  general  models  with  explicit  allowance  for  random 
systematic  errors  are  presented  in  later  sections. 


■fVi 

Let  the  i^  observation  point  be  defined  by  the  mean  values 


X.  = — 
1 m. 


X.^  and  Y.  = 

it  1 m. 

1 


Y.^.  Our  problem  is  to  fit  the  straight 


m, 

1 
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Y = a + bX 


(2.1) 


to  the  n points,  X.,  Y^.  In  (2,1)  a and  b denote  the  estimated  values 

of  the  parameters  a and  p.  Unbiased  estimates  of  the  variances  of 

X , Y_,  X,,  and  Y.  are: 
it  it  1 1 


m. 

1 


_i_  Y 

m.  - 1 Lj 

1 4--  1 


2 

s . 

Til 


m. 

1 


Y.)^(2.2) 


2 12 

= s . 

Xi  m.  €1 


(2.3) 


The  estimate  from  the  sample  in  the  i^^  group  of  the  correlation 

coefficient,  r.,  between  X.^  and  Y.  . which  also  characterizes  the 
1 it  it 

correlation  between  the  errors  in  the  coordinates  X.  and  Y of  the 

• th  ^ i 

1 observation  point,  is  given  by: 


r.  s . s . = s 

1 €1  T|1  €T11 


i I 

t = 1 


(2.4) 


The  numerical  values  of  the  three  coordinates  and  their 
variances  for  the  three -point  examples  discussed  in  detail  in  this 
paper  are  given  in  Table  2.1. 


TABLE  2.  1* 


i 

m. 

1 

X. 

1 

2 
s . 

€1 

2 

"x. 

1 

Y. 

1 

2 

s . 

Til 

2 

®Y. 

1 

1 

5 

2 

5 

1 

2 

10 

2 

2 

5 

6 

20 

4 

4 

15 

3 

3 

5 

8 

25 

5 

8 

30 

6 

* The  sample  values  in  this  table  are  unrealistic  since,  in 
practice,  the  probability  of  finding  such  round  numbers  would  be  near 
zero;  these  round  numbers  were  chosen  for  convenience  only.  See 
Section  8 for  an  example  with  a more  typical  set  of  numbers. 
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Using  three  different  values  of  r^  = 0,  -0.9,  and  +0.9, 
together  with  the  values  given  in  Table  2.1,  we  obtain  by  the  methods 
to  be  discussed  in  detail  in  the  sequel,  the  three  solutions  shown  by 
the  three  different  lines  on  Figs.  1,  2 and  3.  The  lines  associated 
with  each  of  the  three  points  on  Fig.l  extend  a distance  equal  to  one 
standard  error  and  are  intended  to  provide  a visual  indication  of  the 
uncertainties  in  each  of  the  three  observed  values.  The  meaning  of 
the  ellipses  shown  on  these  three  figures  will  be  explained  later. 

The  necessity  for  estimating  the  standard  errors,  s and  s 

X 1 

associated  with  each  point  before  applying  the  method  of  least  squares 
will  undoubtedly  appeal  to  most  physicists  as  a most  natural  course  of 

action  and  may  lead,  in  many  cases,  to  greater  care  in  the  planning  of 

2 2 

experiments.  The  variances,  s^  and  s^j^  may  be  considered  to  arise 
from  experimental  errors  in  observation  of  the  coordinates  of  each 
point,  or  may  be  considered  to  arise  from  natural  variations  of  these 
quantities.  For  example,  the  population  mean  of  each  observation 
point,  thought  of  as  corresponding  to  a given  fixed 

setting  (as  of  a rheostat,  for  example)  of  the  experimental  system, 
and  X^,  might  then  be  the  averages  of  a series  of  m.  observations 
X. Y,  . taken  at  that  setting.  Note  that  the  m.  values  observed  within 

the  i^“  group  are  assumed  to  be  samples  from  the  same  statistical  popu- 
lation; in  the  language  of  the  physicist  these  m^  values  are  considered 
to  have  been  obtained  under  the.  same  experimental  conditions.  On  the 
other  hand,  the  observed  values  in  the  n different  groups  may  each  be  from 
statistical  populations  with  different  variances  as  well  as  different  popu- 
lation mean  values,  and  this  will  often  be  the  case  in  practice. 

Let  = (Y.  - a - bX, ) denote  the  deviation  in  the  Y direction 
Yl  1 1 

of  the  i*"^  point  from  the  fitted  line,  and  letw(VY.)  denote  the  weight 
assigned  to  the  deviation  Our  present  problem  is  to  determine 

the  values  of  a and  b in  (2.1)  which  will  minimize,  S(a,  b),  the  weighted 
sum  of  the  squares  of  the  deviations  of  the  points  from  the  line: 

S(a,  b)  = [w(Vy-)  V^.]  (2.5)* 


This  dual  role  of  the  symbols  a and  b,  i.  e.  , as  variables  in 
this  expression  for  S(a,  b)  and  as  the  particular  constants  which  minimize 
S,  should  not  lead  to  as  much  confusion  as  the  use  of  two  sets  of 
symbols. 
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Figure  2.1 
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Figure  2.2 
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Figure  2.3 
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Following  Gauss,  the  square  brackets  [ ] are  used  throughout 
this  paper  to  denote  the  sum  of  the  n values  so  enclosed  and  corre- 
sponding to  the  n independent  points  to  which  the  functional  relationship 
is  to  be  fitted.  It  will  be  convenient  to  define  the  weight  of  a deviation 
to  be  the  reciprocal  of  its  variance  as  estimated  by  means  of  the  follow- 
ing particular  approximate  formula  for  this  variance: 


1 2 2 2 2 
— r = s = s - 2br.  s s + b s 
w(VYi)  Yi  1 Yi  Xi  Xi 


2 1 

= s.  = — (2.  6) 

1 w. 

1 


The  above  approximate  expression  for  the  variance  follows  directly 
from  the  above  expression  for  a-nd  the  law  for  the  propagation  of 

variance.^  It  will  become  evident  as  we  proceed  that  the  definition  of 
the  weight  in  terms  of  this  particular  approximation  to  the  variance 
will  lead  to  consistent  estimates  of  the  parameters  a and  p.  With  a and 
b now  representing  the  particular  values  of  these  variables  which 
minimize  S(a,  b),  (2,  5)  may  be  expressed; 

S(a,  b)  = [w(V  )(Y.  - a - bX.)^]  = [(V  ./s.)^]=  minimum  (2.7) 
Yii  l Yii 

The  alert  reader  will  raise  the  question  as  to  why  we  have  chosen  to 
measure  our  deviations,  in  the  Y direction,  and  the  answer  is 

that  the  direction  chosen  is  immaterial  since  S is  invariant  to  a 
homogeneous  strain,  translation,  or  rotation  of  the  coordinate  axes; 
the  logical  necessity  for  this  invariance  in  least  squares  was  pointed 
out  by  Roos.  12./  In  particular,  it  is  easy  to  show  that  [w(V  )V^.  ] is 

identical  to  (2.7);  thus  V„.  = (X.  + — -;^Y.)=-;^  V__.  and 

Xi  1 b b 1 b Yi 

2 

1_2  22  12  ®il 

-<^Xi)  " '^Xi  ^ ^ b^  ^ b^  ^ b^.  ‘ 


Note  that  s.  is  finite  and  positive  except  in  the  trivial 

cases  r.  = ± 1 and  b s,,.  = r.  s .. 

1 Xi  1 Yi 

t 2 2 

I Terms  involving  the  variances  s and  s,  of  the  random 

. . a b 

variables  a and  b were  omitted  in  deriving  (2.  6). 

— / C.  F.  Roos,  " A General  Invariant  Criterion  of  Fit  for 
Lines  and  Planes  where  all  Variates  are  Subject  to  Error,  ” Metron, 
Feb.,  1937. 
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The  proof  of  the  more  general  invariance  of  S stated  above  is  given 
in  Section  3.  Throughout  the  remainder  of  this  and  the  following 

sections  the  symbol  w.  will  be  understood  to  represent  w(V  ) and 

2 ^ 2 
the  symbol  s.  will  be  understood  to  represent  s 

1 Vyi 


Since  S is  to  be  a minimum  for  the  least  squares  determi- 
nation of  the  values  of  a and  of  b,  it  follows  that  the  partial  deri- 
vatives of  S with  respect  to  a and  b must  both  be  equal  to  zero.  We 
will  consider  first  the  partial  derivative  with  respect  to  a: 


_1 

2 


a 


8a 


[w.  Y.] 
11 


[w.(Y.  - a - bX.)l  = 0 
11  1 

(2.8) 

[w.  X.] 

, 11  — — 
• - b = Y - bX 

(2.9) 

The  point  X, 


Y is  what  Deming  — 


V 


calls  the  quasi  center  of  gravity. 


The  first  step  in  the  series  of  calculations  required  for  our 
solution  is  the  determination  of  the  n values  of  w.  corresponding  to 
the  n pairs  of  observations.  The  reader  will  detect  a logical  diffi- 
culty in  our  development  at  this  point,  since  he  is  asked  to  use  a 

formula  for  w.  which  involves  the  value  of  the  so  far  unknown  con- 
1 

stant  b.  This  is  indeed  a logical  difficulty,  but  is  overcome  in  prac- 
tice simply  by  using  an  estimated  value,  say  bQ,  in  (2.6)  for  the 
initial  determination  of  and  later,  if  necessary,  repeating  the  entire 
set  of  calculations  with  a better  estimate  of  b obtained  from  the  first 
set  of  calculations.  The  general  conditions  under  which  this  itera- 
tive process  will  be  convergent  have  not  been  studied,  but  no  difficulty 
is  anticipated  in  most  practical  applications.  An  estimated  value  of 
the  quasi  center  of  gravity  may  now  be  determined. 


Substituting  the  value  of  a obtained  from  (2.9)  into  (2.1),  we 

obtain: 


(Y  - Y)  = b(X  - X)  (2.10) 

It  will  be  convenient  now  to  choose  a new  set  of  coordinates, 

X = X - X and  y = Y - Y with  their  origin  at  the  quasi  center  of  gravity; 
then  (2.10)  and  (2.7)  become: 
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2 2 

In  the  limit  as  cr  . = 0 or  cr  . = 0,  we  obtain  the  following  two  solu- 

€1  T[1 

tions  from  (2.16)  and  (2.18),  respectively: 

r y-^- 

11 


- s 


b = 


Yi 


2 

X. 

1 


Yi 


(where  cr  . = 0) 

GL 


(2. 20) 


b = 


i z 

r y.x. 
11 


■xi 


(where  cr  . = 0) 

rii 


(2.21) 


Fortunately  in  some  cases  all  of  the  bivariate  distributions 
associated  with  the  n observation  points  may  be  considered  to  be 
samples  from  statistical  populations  with  the  same  variance.  In 
these  cases  it  will  be  advantageous  to  pool  the  sample  variances 
associated  with  the  observations  on  each  of  the  coordinates,  and  in 
this  way  obtain  improved  estimates  which  also  have  the  advantage  of 
being  independent  of  i.  These  pooled  estimates  may  be  calculated 
from  the  following  equations: 

m - 1)  s^.]  [(m  - 1)  s^ 

1 Cl  zi  1 ni 

; S = !- 

[m.  ] “ n ■q  [m.  1 - n 

1 1 


] 

(2.22) 


2 2 

s„.  = s /m. 
Yi  ri  1 


(2.23) 


The  decision  to  pool  the  sample  variances  in  the  manner  indicated 
above  should  be  made,  of  course,  only  after  tests  have  been  made  to 
see  whether  the  n individual  sample  variances  may  reasonably  be  con- 
sidered to  be  from  the  same  parent  population.  A good  review  of  such 
tests  is  given  in  a publication  of  the  Office  of  Scientific  Research  and 
Developmentli./  and  on  page  196  of  reference  9. 

11.^  Churchill  Eisenhart,  Millard  W.  Hastay,  and  W.  Allen  Wallis, 
"Techniques  of  Statistical  Analysis,  " Chapter  15,  McGraw-Hill  Book  Co., 
Inc.,  1947. 
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A pooled  estimate  of  the  correlation  coefficient,  r,  must  also 
be  obtained.  Such  a pooled  estimate  of  r may  be  obtained  directly  from: 


[f (X,,  - X.KY,^  - Y.)] 

S(Xit-x/][Z(Y.t-Y/] 


(2.24) 


It  is  often  better,  however,  to  make  use  of  the  individual  sample  values 
of  r.  so  that  they  also  may  be  tested  to  find  out  whether  they  are  pro- 
babfy  from  the  same  population.  For  this  purpose  use  may  be  made  of 
a characteristic  of  sample  correlation  coefficients  for  normally  dis- 
tributed data  discovered  by  Fisher.  Fisher  found  that  the  statistic: 

z-  = 7 [log^(l  + " log  (1  - ^-)]  (2*  25) 

\ c e 1 e 1 


-1  / 

is  distributed  almost  normally  with  variance  (m.  - 3)  . Snedecor  — 

discusses  the  use  of  this  statistic  for  testing  whether  the  individual 
values  of  r,  are  from  the  same  population,  and  presents  a very  con- 
venient graphical  method  for  converting  r to  z and  vice  versa, 

i 1 

Since  z.  is  approximately  normally  distributed,  a pooled  estimate 
r may  be  obtained  from  the  weighted  average  value: 


_ - 3)  z.] 

^ = [m.  - 3] 


(2.26) 


which  is  then  converted  by  means  of  the  following  relation  to  the 
required  pooled  estimate  of  r: 


r = tanh 


(2.27) 


The  above  relation  allows  for  a small  bias  in  the  distribution  of  z;  this 
transcendental  equation  for  r may  be  solved  by  an  iterative  process, 
using  initially  an  estimate  of  r. 


1^/  R.  A.  Fisher,  "On  the  Probable  Error  of  a Coefficient  of 
Correlation  Deduced  from  a Small  Sample,  "Metron,  vol.  1,  no,  4, 1921. 
13  / 

George  W.  Snedecor,  "Statistical  Methods,  " Iowa  State 
College  Press,  Fifth  Edition,  1956,  page  175. 


When  it  is  feasible  to  pool  the  variances  and  covariances  as  indicated  above,  it  is  no 
longer  necessary  to  use  an  estimated  value,  b , to  obtain  a solution,  and  the  above  equations  for 
the  slope  simplify  to  the  following: 
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before  the  radical  in  (2.  34)  is  perpendicular  to  the  best  fitting  line  if, 
and  only  if,  (C^-  l)[w.  x.  y.]  - p C {[w.(x?  - y?)]}  = 0. 

Ill  11  1 

The  locations  (X!,  Yp  along  the  fitted  lines  on  Figs.  2.1,  2.2, 
and  2.  3 are  called  the  "adjusted  values"  of  these  points,  and  formulas 
for  their  determination  are  given  in  the  following  section  of  this 
paper.  The  case  shown  on  Fig.  2,  3 is  of  particular  interest  since 
the  line  obtained,  which  actually  passes  above  the  point  for  i = 1, 
would  probably  not  have  been  anticipated  as  a possibility  by  an  analyst 
not  equipped  with  the  present  theory.  Note  also  in  this  case  that  the 
order  of  ascending  magnitude,  X^,  X^  and  X^  is  changed  to  Xj^,  X^ 

and  then  X^  in  the  case  of  the  "adjusted"  values,  and  this  illustrates 
the  difficulty  involved  in  the  a posteriori  ordering  of  the  data  as 
required  in  the  Wald^/  and  Bartlett— ^ methods  of  analysis. 

Table  2.2  gives  the  values  of  a and  b as  calculated  for  the 
general  case  illustrated  on  Figs,  2.  1,  2.  2,  and  2.  3,  and  for  ten 
other  special  cases.  Each  of  the  pairs  of  values  of  a and  b shown 
in  Table  2,2  are  least  squares  solutions  for  the  parameters  of  the 
lines  fitting  the  same  three  points  of  Table  2,  1 but  with  weights  as 
described  in  the  left-hand  column  of  Table  2.2.  Thus  Table  2,2 
provides  examples  of  the  effects  of  the  several  components  in  the 
weighting  factor.  Note  in  particular  the  large  influence  of  r^|^.  Since 
the  weights  depend  on  a knowledge  of  the  variances  and  covariances 
of  a series  of  observations  of  the  coordinates  of  the  points,  it  is 
only  when  estimates  of  all  of  these  variances  and  covariances  are 
available  that  we  may  derive  consistent  conclusions  from  our  least 
squares  solution. 
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TABLE  2,2 


No. 

Method  of  Weighting 

^i 

a 

b 

1 

General  Case  (Table  2.1)  Fig.  2.1 

0 

-0. 038 

+0. 879 

2 

Weight  independent  of  i 
(Pooled  variance;  = 5) 

0 

-0. 647 

+0. 996 

3 

General  Case  (Table  2.1)  Fig.  2.2 

-0.9 

-0. 030 

+0. 880 

4 

Weight  independent  of  i 
(Pooled  variance;  = 5) 

-0.9 

-0. 657 

+0. 998 

5 

General  Case  (Table  2,1)  Fig.  2.  3 

+0.  9 

+0. 894 

+0. 627 

6 

Weight  independent  of  i 
(Pooled  variance;  = 5) 

+0.  9 

-0. 129 

+0. 899 

7 

s^.  = 0;  s^.  (from  Table  2,1) 

A.1  I 1 

0 

+0. 151 

+0. 811 

8 

2 

s^.  = 0;  Weight  independent  of  i 

Jvi  2 

(Pooled  variance  for  SyJ  constant) 

0 

-0. 286 

+0. 929 

9 

2 2 

syi  = Oj  ^xi  Table  2.1) 

0 

-0.  125 

+0. 938 

10 

2 

Sv*  = Oj  Weight  independent  of  i 

^ ^ 2 
(Pooled  variance  for  s^;  constant) 

0 

-1. 077 

+1. 077 

11 

2 2 

o'.  = (r  .;p  = 0;w.  by(2.33)  and  (1-16) 

GL  T|1  1 

0 

-0. 015 

+0.  877 

12 

2 2 

(T  . = cr  .;p=-0.9;w.  by(2.33)  and  (1-16) 

€1  T|1  1 

-0.9 

-0.104 

+0.  893 

13 

0-^.  = cr^. ; p = +0.  9#w.  by  (2.  33)  and  (1-16) 

61  T|1  1 

+0.9 

0.423 

+0. 767 
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at 

^ When  Bartlett’s  test  was  applied  to  the  variance  estimates, 

s . and  s in  Table  2.1,  it  was  found  that  variations  from  the  pooled 
€i  1^1 

estimates  larger  than  those  in  Table  2.1  would  be  expected  with  random 
normally  distributed  samples  of  this  size  with  probabilities  of  0.  57 
and  0.33,  respectively.  Thus,  in  the  absence  of  physical  reasons  for 
expecting  different  variances,  it  appears  that  these  might  well  be 
samples  from  the  same  population,  and  thus  may  be  pooled  to  obtain 

an  improved  estimate.  The  pooled  variances  of  Table  2.1  are  s =16.666 

2 ^ 

and  s = 18.  333.  These  are  the  values  used  in  obtaining  the  solutions 

given  in  Table  2.2  for  this  case  for  the  three  assumed  values  of  r. 

Our  general  formula  (2.15)  for  the  slope  of  the  least  squares 

straight  line  is  quite  complicated,  but  it  should  be  noted  that  this 

complication  is  inherent  to  the  problem  and  arises  from  the  use  of  a 

very  large  amount  of  detailed  information,  i.  e. , 5 numerical  values 

of  X.,  Y.,  s,^.,  s„.,  and  r,  associated  with  each  point.  It  is  thus 
1 1 Xi  Yi  i 

natural  to  expect  greater  complication  when  more  information  is 
taken  into  account.  In  the  special  cases  for  which  a smaller  number 
of  different  data  are  necessary  to  describe  the  information  available, 
the  solutions  (e.  g. , (2.20),  (2.21),  (2.28),  (2.  31),  (2.  32),  and  (2.  34)) 
are  correspondingly  less  complex.  Furthermore,  the  above  formulas 
are  given  primarily  with  the  object  of  showing  explicitly  the  ways  in 
which  the  several  components  of  the  weights  influence  the  slope.  In 
a later  section,  matrix  methods  of  solution  are  described,  and  these 
are  recommended  for  use  not  only  for  the  solution  of  the  general  case 
but  for  all  of  the  simpler  cases,  since  their  use  yields  additional 
information  as  a by-product  such  as  the  standard  errors  of  the  param- 
eters. 


* 


This  particular  test  is  described  in  Section  8. 
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In  the  above  b denotes  the  value  of  b with  the  expected  values  substituted  for  the  sample  values. 
The  above  quadratic  has  the  unique  solution  b = |3  (corresponding  to  the  positive  square  root) 
and  thus,  in  this  sense^  b is  an  unbiased  estimate  of  (3.  Note  that  this  solution  for  b is  inde- 
pendent of  the  sums  and  that  b is  an  unbiased  estimate  of  p (in  the  sense  described  above) 
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even  for  n = 1.  The  proof  that  b = (3  in  the  general  case  with  a ??  0 
and  with  unequal  weights  is  left  as  an  exercise  for  the  student. 


Note  that  the  solution  for  the  above  special  case  by  simple 
regression  analysis  may  be  written: 


b»  = [X.Y.]/[X.  ] 


1 1 


(2.40) 


If  we  introduce  the  expected  values  of  [X.  Y.]  and  of  [X.  ] in  (2.40)  we 
obtain:  ^ ^ ^ 


P [xf  ] + (n/m)  p 0-  0- 

b'  = ^-2-  (2.41) 

[X.  ] + (n/m)  (T 
lo  e 


Thus  we  see  that  regression  analysis  leads  to  an  estimate  of  p which 
is  biased  in  the  above -described  sense  unless  cr^  = 0. 


i 


r 
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3.  THE  ADJUSTED  VALUES:  X|,  and  Y! 

1 1 

Since  the  minimized  sum  S is  invariant  to  the  direction 
chosen  for  measuring  the  deviations  of  the  observations,  X.,  Y., 
from  the  line,  we  have  no  clue  so  far  as  to  the  expected  locations 
along  the  fitted  line  of  the  adjusted  values,  XI,  YI.  In  fact,  we  see 
that  the  parameters  a and  b have  been  determined  without  reference 
to  such  adjusted  values  which  have  usually  played  a prominent  role 
in  the  derivation  of  previous  least  squares  solutions.  R,  J.  Adcock  15./ 
and  later  Karl  Pear  son  !§./  defined  the  "closest  fitting"  line  to  be  that 
which  minimizes  the  sum  of  the  squares  of  the  perpendicular  dis- 
tances of  the  points,  X.,  Y.  from  the  line,  but  we  shall  see  that  this 

11 

criterion  is  equivalent  to  the  method  of  least  squares  only  under  very 
special  conditions  of  weighting. 

Since  our  method  of  determining  the  adjusted  values  involves 
the  use  of  an  invariant  statistical  form  identical  to  that  used  by 
Hotelling IZ/  15./  in  generalizing  Student’s  ratio  to  the  multivariate 
case,  it  seems  useful  to  digress  somewhat  at  this  point  in  order  to 
summarize  his  analysis.  Hotelling’s  generalized  T distribution  may 
be  used  with  normally  distributed  observational  data  to  define  an 
elliptical  equi -probability  curve  with  coordinates,  X'^,  Y*,  and  with 
its  center  at  the  sample  mean,  X.,  Y.;  these  ellipses  determine 
confidence  regions  for  the  population  means  Y.  , on  the  X*,  Y* 

plane  characterized  by  the  probability,  1 - p.(X*,  Y^): 


15./  R.  J.  Adcock,  The  Analyst,  (Des  Moines),  1878, 
pp.  53-54. 

1^/  Karl  Pearson,  "On  Lines  and  Planes  of  Closest  Fit, 
to  Systems  of  Points  in  Space,"  Phil.  Mag,,  6 Ser.  , vol.  2,  Nov., 
1901,  pp.  559-572. 

IZ/  Harold  Hotelling,  "The  Generalization  of  Student’s  Ratio,  " 
Annals,  of  Math.  Stat. , vol.  II,  no.  3,  Aug.,  1931,  pp.  360-378.  The 
discussion  of  the  degrees  of  freedom  in  this  original  article  is  not 
easy  to  follow.  Better  discussions  are  given  in  Referencell,  Chap- 
ter 3,  by  Harold  Hotelling  and  in  Reference  18,  pages  407-409. 

15./  Harald  Cramer,  "Mathematical  Methods  of  Statistics,  " 
Princeton  University  Press,  1946. 
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p.(X^  Y«)  = (l  -H 


and  in  the  limit  as  m.  —►  oo  _ 

" -T?/2 

p (X*.  Y*)  = e I ' 

where 

(Y*  - Y.)^  2r.(Y*  - Y.)(X#  - X ) (X*  - X,)^ 

tS— S-4  r-i 1 i L+ 


(1  - r.) 


®Yi 


Yi  Xi 


®Xi 


(3.1) 


(3.2) 


(3.3) 


If  we  make  a practice  of  determining  the  location  and  size  of  these 
ellipses  by  solving  (3.1)  and  (3.3)  for  the  same  probability,  p.,  then— 
for  a large  number  of  such  ellipses  determined  from  normally  distrib- 
uted data— the  population  mean  of  the  i^^  point  will  lie  within  a fraction 
(1  - p.)  of  them  and  will  lie  outside  of  a fraction  p.  of  such  ellipses. 
Although  (3.1)  and  (3.  2)  are  convenient  for  numerical  calculations,  it 

may  be  noted  that  T,  (m.  -2)/2(m.  -1)  is  distributed  as  the  Fi she r - Sne decor 

111 

variance  ratio  F {2,  (m.  - 2) };  tables  and  graphs  of  the  significance  levels 
F(v  , V , p)  are  given  in  a companion  paper. iZ./ 

i.  Ct 


We  will  now  define  the  adjusted  point  (XJ,  Y!)  corresponding  to 

the  i^^  observed  point  (X.,  Y.)  as  the  particular  point  (X,  Y)  along  the 

? ^ ^ 7 

fitted  line  at  which  Tr  has  its  minimum  value,  Gf,  and  the  probability, 

1 1 

p.(X*,  Y^^)  has  its  maximum  value , pi  = p.(XI,  YI),  i.  e. , the  location 
corresponding  to  the  values  of  X‘^  and  Y'f'  which  simultaneously  mini- 
mize  T.  and  satisfy  the  least  squares  fitted  relation  Y = a + bX.  Using 
this  relation  to  eliminate  either  Y*  or  X'^'  from  (3.  3)  and  then  differen- 
tiating the  resulting  expression  with  respect  to  the  other  variable,  we 
obtain  the  following  equations  for  the  adjustments,  i.  e. , the  least  squares 
estimated  errors  of  each  of  the  coordinates  of  each  point. 


XI  - X.  = w.(bs 
1 1 1 Xi 


YI  - Y.  = -w.(s,,, 
1 1 1 Yi 


r.s  s )V  =w.(bs 
1 Xi  Yi  Yi  1 Xi 


r.bs  s )V  = -w.(s 
1 Xi  Yi  Yi  1 Yi 


Vxi^YiXh-^-'^V 

-ribSx-SYiXY. -a-bXi)(3.^ 


t 2 

Note  that  T.  is  invariant  to  a homogeneous  strain,  transla- 
tion, or  rotation  of  the  coordinate  axes.  See  reference  17. 

(19)  L.  E.  Vogler  and  K,  A.  Norton,  " Graphs  and  Tables  of  the 
Significance  Levels  F(v^,  V2»  p)  for  the  Fisher -Snede cor  Variance  Ratio" 
N.  B.  S.  Report  No.  5069  , May,  1957. 
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Thus  the  adjusted  values  (X|, Yp  of  the  point  have  been  defined 

so  as  to  yield  consistent  estimates  of  the  population  mean  of  this  point 
since  they  correspond  to  minimum  values  of  T^,  and  the  corresponding 
ellipses  converge  on  the  population  means  of  the  n points  as  all  of  the 
approach  infinity. 


Upon  substitution  of  the  particular  values  (3.  4)  and  (3.  5)  in 
(3.  3)  we  obtain,  after  some  algebraic  manipulation: 


= w.  = w.(Y.  -a-bX.)^  = w.  {(Y:  - Y.)-b(X!  -X.)}^  (3.6) 

1 lYiii  1 111  11 


The  last  of  the  above  expressions  is  easily  established  by  noting  that 
we  may  subtract  Y^  - a - bXj  = 0 from  Y.  - a - bX.  without  changing 

its  value.  Thus  we  see  that  the  i^^  component  of  the  minimized  sum  S 

_ 

is  identically  equal  to  the  minimized  value,  G.  , of  Hotelling’s  invariamt 
form  T^,  Thus,  regardless  of  the  form  of  the  statistical  distribution 
of  the  observational  data,  our  adjusted  points  are  those  lying  on  the 
least  squares  fitted  relationship  which  have  a minimum  squared  devia- 
tion from  the  observed  points.  In  the  case  of  data  from  normally  dis- 
tributed populations,  it  appears  from  the  above  derivation  that  our 
adjusted  points  may  also  be  characterized  as  lying  at  their  "most 
likely"  locations  along  the  fitted  line. 

2 2 2 
Since  G.  is  a particular  value  of  T.  , it  follows  that  G.  and 

2n ^ ^ ^ ' M / 

S = [G.J  will  have  the  same  invariance  properties  that  Hotelling  tSJ 

has  established  for  T?;  thus  G?is  invariant  to  an  affine  transforma  - 
1 1 

tion,  i.  e. , is  invariant  to  a homogeneous  strain,  translation,  or 

rotation  of  the  coordinate  axes.  This  proves  the  statement  made  in 

connection  with  (2,  7).  This  invariance  also  makes  all  least  squares 

solutions  subject  to  a very  significaint  limitation  from  the  point  of  view 

of  the  physicist.  Let  us  suppose  that  all  of  the  observations  of  the 

random  variables  X.  and  Y.  (t  = 1 to  oo  and  i = 1 to  oo)  are  subject  to 

it  it 

"systematic"  errors  in  the  form  of  constant  biases  u^  and  v^,  respec- 
tively. Each  of  the  population  mean  values  X.  and  Y.  will  then  differ 
from  the  "true"  physical  values  and  ^y  the  amounts  of  these 

constant  biases.  It  will  clearly  be  impossible  by  least  squares  to 
detect  such  constant  bias  since  the  introduction  of  such  a bias  is  equiva 
lent  to  a translation  of  the  X coordinate  axis  by  an  amount  u^,  and  the 
Y coordinate  axis  by  an  amount  v^  , and  our  solution  is  invariant  to 
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such  a translation.  Thus,  throughout  our  development  of  least  squares 
solutions  it  should  be  remembered  that  the  functional  relations  es- 
timated are  between  the  population  mean  values  X,  and  Y.  and  these 

^ ^ io  lo 

may,  by  virtue  of  constant  biases,  differ  from  the  "true”  values  X 

and  Y.^. 


If  we  set  (3.  3)  equal  to  the  values  of  G.  given  by  (3.  6),  we 
obtain  a formula  for  the  particular  ellipse  with  coordinates  (X*,  Y*) 
and  of  probability  (1  - p!)  of  containing  the  "true"  mean;  such  ellipses 
are  shown  on  Figs.  2.1,  2.2,  and  2.  3 for  the  example  defined  in 
Table  I and  for  the  cases  r.  = o,  -0.9  and  +0.9,  respectively;  note 
that  these  ellipses  are  tangent  to  the  least  squares  fitted  line  at  the 
adjusted  locations,  (X.*,  YI),  of  the  n points.  Thus  we  have  established 

the  remarkable  result  that  our  least  squares  solution,  independently 
of  the  form  assumed  for  the  statistical  distributions  of  the  observational 
data,  may  be  considered  to  be  exactly  equivalent  to  the  solution  of  the 


geometrical  problem  of  finding  the  parameters  of  the  particular  linear 
functional  relationship  which  is  tangent  to  the  n ellipses  defined  by 
Hotelling’s  generalized  T^  associated  with  the  n observational  points 
and  for  which  the  sum  uh  has  its  minimum  value  [G^]  = S(a,  b).  t 
Since  these  ellipses  each  converge  on  the  population  mean  values 
(X.^,  as  the  m^  approach  infinity,  and  the  latter  are  related  by 

the  functional  relationship  involving  the  true  values  a and  (3  of  the 
parameters,  it  follows  that  our  least  squares  solution  leads  to  con- 
sistent estimates  of  these  parameters  as  the  m^  approach  infinity. 


It  is  of  some  interest  to  consider  under  what  circumstances 
the  adjusted  locations  are  on  the  perpendiculars  to  the  line  drawn 
from  the  points  X.,  Y..  It  is  evident  from  Figs.  2.1,  2.2,  and  2.  3 

that  this  will  be  the  case  only  when  the  ellipses  degenerate  to  circles 


(s 


Xi 


= s^.  and  r^  = 0)  or  when  the  minor  (or  major)  axes  of  the 


ellipses  are  parallel  to  the  fitted  line;  such  conditions  of  weighting 
would  seldom  be  expected  in  practice.  It  should  be  noted  that  Karl 
Pearson  did  not  claim — / that  his  line  of  closest  fit  was  determined 


This  conclusion  depends  upon  the  assumption  that  unique 
solutions  exist  for  both  the  least  squares  and  the  geometrical  problems; 
all  efforts  made  to  date  have  failed  to  develop  a set  of  data  for  which  a 
unique  solution  does  not  exist. 
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by  the  method  of  least  squares;  he  merely  gave  formulas  for  fitting 
the  line  which  minimized  the  sum  of  the  squares  of  the  perpendicular 
distances  of  the  points  from  it.  Thus  his  solution  would  appear  to 
have  more  geometrical  than  statistical  significance. 
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4.  THE  GENERALIZED  NORMAL  EQUATIONS  FOR  DETERMINING 
THE  u UNKNOWN  PARAMETERS  OF  A FUNCTION  F INVOLVING 
k»  VARIABLES  SOME  OF  WHICH  MAY  ENTER  F NON-LINEARLY 
AND  ASSUMING  NO  SYSTEMATIC  ERRORS 


When  more  than  one  unknown  parameter  is  to  be  determined 
by  least  squares,  the  solution  is  most  easily  obtained  by  matrix 
methods  (see  Section  6)  for  solving  the  u "normal  equations".  In 
this  section  generalized  normal  equations  will  be  developed  which 
will  be  applicable  to  the  problem  of  determining  the  u unknown 
parameters  of  a function  F involving  k’  variables,  only  k of  which 
are  random. 


It  is  assumed  that  n sets  of  observations  (n  ^ u)  are  available 
which  define  n points  in  the  k'  dimensional  space,  i.  e.  , 

X..(j  = 1 to  k'  and  i = 1 to  n),  together  with  nk(k  < k’)  sample  mean 
7 ? 

variances  s7.  = s ../m.  (j  = 1 to  k)  and  nk(k  - l)/2  different  sample 

correlation  coefficients  r.,  .(j  # h);  actually  a total  of  k[m.J  individual 

ihi  1 

observations  of  the  k random  variables  are  required  to  define  these 
n sample  points  and  their  sample  variances  and  covariances.  The  k 
random  variables  are  each  assumed  to  have  statistical  properties 
similar  to  those  postulated  for  the  two  random  variables  in  Section  2; 

thus  X..  = X..  + n..  with  the  n..  independent  with  respect  to  the 

Jit  jio  'jit  2 2 

subscripts  i and  t,  E(ti7.,)  = ^ ti,  .^)  = P . o'  ..  j = 1 to  k, 

Jit  2 Jit  hit  "^jhi  Tiji  qhi 

h = 1 to  k,  and  i = 1 to  n;  cr  . = o'  . /m..  Furthermore,  it  is  assumed 

Ji  “HJi  1 

that  a function  F(X, . , ...  X..  , . . . X,  ..  , a.  ...  a , ...  a ) = 0 

lio  jio  k’lo  1 p u 

exists  which  passes  through  the  population  mean  values  = 1 to  k') 

of  these  n points;  this  function  involves  u unknown  parameters  and  ^ 
may  be  non-linear  in  one  or  more  of  the  variables  or  the  parameters. 


If  it  is  known  that  the  function  F(X.,  ) must  pass  through  v(v<u) 

jh 

specified  points  exactly  (such  points  may  be  considered  to  be  points  of 
infinite  weight,  i.  e. , = 0(j  = 1 to  k and  h = 1 to  v)),  then  it  will  usually 

be  possible  to  eliminate  v of  the  u unknown  parameters  from  F at  the 
outset,  and  thus  obtain  a modified  function  H involving  only  (u  - v) 
parameters  to  be  estimated  by  least  squares.  For  example,  if  the 
straight  line  represented  by  the  function  F = (Y  - a - px}  = 0 is  known 
to  pass  through  the  point  (Y  = 1,  X=l),  we  may  fit  the  simpler  function 
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The  principle  of  least  squares  provides  a method  for  determining 

consistent  estimates,  a , of  the  u values  of  the  parameters,  a , by 

P P 

minimizing  the  weighted  sum  of  the  squares  of 

F.  = F(X,.,  . . . X..,  ...  X,  a,,  ...  a , ...  a ) evaluated  at  the 
1 li  ji  k’i  1 p u 

n points: 

S = [w.  F^l  = [(F.  /s.)^l  = minimum  (4.1) 


It  will  be  convenient  and  will  lead  to  consistent  estimates  of  the 

parameters  and  to  an  invariant  form  for  S if  we  define  the  weight 

w,  of  the  deviation  F.  to  be  the  reciprocal  of  its  variance  cr?  as 
11  1 

estimated  by  the  following  particular  approximate  formula: 


(1/w^  = s.  = 


j=l  h=l 


9F. 


V ’■jhi  ®ji 


(4.2) 


In  the  above  =1  when  j = h.  Part  of  the  approximation  in  the 

above  expression  arises  from  the  fact  that  only  the  linear  terms  in 
the  Taylor’s  series  expansion  of  F.  were  retained  in  its  derivation. 

In  many  cases— all  linear  functional  relations,  for  example —there 
may  be  only  first  derivatives,  and  then  such  an  approximation  is  not 
involved.  In  other  cases,  provided  we  may  also  assume  that  the 
variables  X.  are  normally  distributed,  additional  terms  in  the  Taylor's 
series  may“\>e  retained  and  improved  weights  determined;  this  latter 
method  of  obtaining  improved  weights  is  explained  in  a later  section 
in  connection  with  the  discussion  of  a particular  problem,  but  the  im- 
provement possible  with  its  use  is  seldom  worth  the  additional  compu- 
tational work  required  for  its  application. 


H = {Y-1-(3(X-1)}  = 0 which  now  involves  only  the  single  param- 
eter p.  Such  a reduction  in  the  number  of  unknown  parameters  should 
be  accomplished  whenever  this  is  feasible.  Throughout  this  paper 
it  has  been  implicitly  assumed  that  F denotes  this  modified  function 
H with  the  minimum  number  of  unknown  parameters. 


- 4.  3 - 


Upon  substitution  of  (4.  2)  in  (4.1)  we  may  formally  express 
our  solution  of  the  general  problem  of  least  squares  as  the  simultaneous 
solution  of  the  following  u equations  which  involve  as  unknowns  only 

the  u parameters  a : 

P 

1 as 

2 9i“  = 0 p = Ito  u (4.3) 

P 

These  u parameters  will,  in  general,  enter  F.,  w.,  and  consequently 

S in  a non-linear  way,  and  this  will  complicate  the  simultaneous 

solution  of  (4.  3).  The  estimates  of  the  parameters  obtained  by  using 

(4.  3)  will  always  be  consistent  estimates  of  the  true  values  of  these 

parameters,  although  not  necessarily  statistically  unbiased  estimates. 

Furthermore,  as  was  pointed  out  in  the  previous  section,  least  squares 

leads  to  the  best  relations  between  the  population  mean  values  X,.  , 

jio 

and  these  may  differ  from  the  "true"  values  by  virtue  of  systematic 
biases. 


We  have  seen  in  Section  2 that  a direct  solution  for  the  param- 
eters even  in  the  simple  linear  case  involves  the  evaluation  of  rather 
complicated  expressions.  The  solution  of  the  equations  (4.  3)  is  often 
even  more  complex,  and  it  becomes  desirable  to  develop  simpler  methods 
of  approach.  The  following  method  is  more  general  than  that  originally 
developed  for  this  purpose  by  Gauss,  but  becomes  identical  to  Gauss' 
method  in  those  cases  for  which  his  solution  is  adequate.  This  generali- 
zation not  only  provides  consistent  estimates  for  the  parameters,  but 
also  more  accurate  expressions  for  the  probable  errors  of  these 
estimated  values  of  the  parameters  and  for  extrapolated  values  of  the 
function  in  those  cases  for  which  either  (1)  the  expression  for  the  weight- 
ing factor  is  dependent  on  the  parameters  or  (2)  the  function  F is 
non-linear  in  one  or  more  of  the  variables.  The  following  method  is 
similar,  in  some  respects,  to  that  of  Kummell. 


4. 1 The  Generalized  Normal  Equations 
2 2 2 

First  set  G.  = w.  F.  = (F.  /s.)  . The  value  of  S as  given  by 
1 11  11 

(4.1)  is  then  expanded  in  a Taylor's  series: 
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u 


S = [G.  ] + S (a  - a ) 

1 o p = i p po 


r 9G.  n 
1 


8a 


p -<0 


2 u a 

+ — S S (a  - a )(a  - a ) 
2 s = l t = l s so''  t to' 


2 2 
a G. 

1 

9a  8a. 
s t -*  o 


(4.4) 


The  above  expression  for  S is  exact  to  quantities  of  the  second 

order  in  (a  - a ).  The  subscript  o indicates  that  these  sums  are 
p po 

to  be  evaluated  using  approximate  values  of  the  estimated  parameters, 

a . The  solution  (4.3)  may  now  be  written: 
po 


1 

2 


8S 


8a 

P 


9G. 

1 

8a 


P 


u 

S (a  - a ) 
s=l  s so 


9 G. 
1 

8a  8a 

p S-- 


= 0 (4.5) 


The  u equations  (4.  5)  are  the  generalized  normal  equations;  it  may 

be  noted  that  they  are  linear  in  the  u unknown  quantities  (a  - a ). 

P P® 

Although  they  are  approximate,  this  approximation  may  be  madb  as 

small  as  the  computer  wishes  to  make  it  by  means  of  the  process  of 

iterative  solution,  using  successively  closer  approximations  for  the 

values  a in  evaluating  the  sums  in  the  square  brackets, 
po 


t 2 

In  these  equations  G.  is  considered  to  be  a function  of  the 
approximate  values  a used  for  its  evaluation,  but  is  considered 

in  most  of  the  remainder  of  this  paper  to  be  the  particular  minimized 

value  obtained  with  a = a . 

po  p 
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It  will  be  convenient  to  write  these  generalized  normal  equa- 
tions in  the  form: 


^10^^11 


(a  - 
P 


a )A 

po  Ip 


(a  - 
u 


a )A « = 

UO  J.11 
/ 


10 


(a,  - a,  _)A  ,+...+  (a  - a )A  + . . . + (a  - a )A  = A 
1 10  pi  P po  PP  u UO  pu  po 


(4.6) 


(a  - a )A  + . . . + (a  - a )A  + 
1 10  ul  p po  up 


+ (a  - a )A  = A 

U UO  UU  UO 


where: 


A = 

pq 


r 1 

d G,  -n 

r 9G. 

9G.  -1 

r a G. 

1 

1 

1 1 

1 

4_ 

I c ^ 

2 

da  da 
P q 

o 

L da 
P 

da 

q 

O 

^i  8a  9a 
P q 

(4.7) 


A 

po 


fi 

9G.  -1 
1 

d"* 

CO 

c 

1 — 

L 2 

9a 

P ^ 

o 

L i 8a 

p -1 

(4.8) 


Note  that  A = A and  that  the  A and  A are  specific  numerical 

pq  qp  pq  po 

values  which  may  be  calculated  from  the  given  values  of  the  observa- 
tions and  weights,  togetiier  with  the  assumed  values  of  a . The 
reason  for  dividing  A into  two  parts,  as  in  (4.  7),  will  appear  in  the 

pq 

next  section.  As  successively  better  approximations  to  the  parameters 

are  determined  by  solving  (4.  6),  these  improved  estimates  may  be  used 

for  re-evaluating  the  quantities  A and  A . In  the  limit  as  a 

pq  po  po  ^ 

approaches  a , A approaches  zero;  thus  the  values  of  A provide 
p po  po 

convenient  measures  of  the  degree  of  convergence  attained  at  various 
stages  of  the  iterative  process. 

These  generalized  normal  equations  differ  from  the  usual 
normal  equations  used  by  all  previous  writers,  with  the  exception  of 
Kummell,  since  they  introduce  the  necessary  additional  terms  which 
arise  from  the  differentiation  of  the  weights. 

As  our  first  illustration  (4.  5)  will  be  applied  to  the  linear 

problem  of  Section  2 in  which  the  observations  of  both  variables  are 

subject  to  error.  Note  that  F.  = Y.  - a - bX.  and  that  (1/w.)  is  given 

111  1 
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by  (2.  6).  Since  there  are  only  two  parameters,  the  generalized 
normal  equations  for  this  problem  may  be  expressed: 


(a  - a^)Aj  j + (b  - b^)Aj2  A^^ 


= ^20 


"^11  = 


(4.9) 

(4.10) 

(4. 11) 


2 2 

A = A = |w.X.  + 2w.  F.(bs  - r.s  s ) 
12  21  1 1 1 1 Xl  1 Xi  Yi 


(4.  12) 


22 


2 2 2 

w.X.  + 4w.  F.X.(bs__.  - r.  s__.  s__.) 
11  111  Xl  iXiYi 


(4.  13) 


. 3 ^2,^  2 .2  2 2 2 

+ 4w.  F.  (bs  - r.  s s ) - w,  F,  s 
11  Xl  iXiYi  iiXl 


(4.  14) 


A,.  = [w.  F.] 
10  1 1 o 


(4.15) 


A-  = I w.X.F.  + w?  F?  (bs^.  - r.  s s ) 
20  iiii  ii  Xi  iXiYi 


“O 


(4.16) 


Comparing  the  above  generalized  normal  equations  for  fitting  points 
to  a straight  line  with  those  obtained  by  the  usual  normal  equations 
as  given  in  most  textbooks,  we  find  that  the  latter  yield  only  the 
leading  terms  in  (4.  12),  (4.  13),  and  (4.  15)  and  will  thus  lead  to  an 
erroneous  solution,  i.  e. , to  a solution  which  does  not  minimize  S 
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in  this  case  for  which  both  variables  are  subject  to  error.  Note 
that  a value  for  a.Q  can  be  determined  for  a given  value  of  b^  by  set- 
ting A^q  = 0.  However,  if  such  a step  is  introduced  regularly  into 
the  iterative  process,  it  will  not  converge;  it  is  necessary  to  use 
estimates  of  both  a and  b,  obtained  from  the  preceding  step  in  the 
process,  in  calculating  the  coefficients  for  the  following  step. 


Although  equations  in  two  unknowns  such  as  (4,9)  and  (4.10) 
are  easily  solved  by  a variety  of  methods,  2.^  Deming's 

systematic  solution,  as  presented  in  Chapter  DC  of  his  book,  X/  is 
particularly  to  be  recommended  since  he  obtains  as  a by-product  an 
evaluation  of  S.  This  method  is  described  in  Section  6 of  this  paper. 
If  we  let  M denote  the  determinant  formed  from  the  coefficients  of 
the  parameters  in  the  generalized  normal  equations  (4.  6) 


M = 1a  I (4.16) 

* pq 

and  let  R denote  the  corresponding  elements  of  the  reciprocal 

pq 

or  image  determinant,  then  the  solution  of  the  generalized  normal 
equations  may  be  expressed: 

(a  -a  )=  S R_A  (4.17) 

p po  qsri  pq  qo 

There  will  be  one  such  equation  for  each  value  of  p(p  = 1 to  u). 


t 7/ 

For  example,  although  Deming—  gives  on  page  184  the 

correct  egression -equivalent  to  (2.  34)  in  this  paper— for  b when 

O'  . = C O'  .,  this  result  will  not  be  obtained  from  the  normal  equa- 
a Tu 

tions  developed  in  his  book,  since  he  neglected  higher  powers  of  the 
residuals  in  approximating  his  (3)  on  page  50  by  (7)  on  page  53.  We 
see  by  the  above  equations  that  the  second  order  terms  in  the 
residuals,  F^,  and  terms  arising  from  the  differentiation  of  the 
weights  control  the  solution  when  both  variables  are  subject  to  error. 

Paul  S.  Dwyer,  "Linear  Computations,  " John  Wiley  and 

Sons,  1951. 

R.  L.  Anderson  and  T.  A.  Bancroft,  "Statistical  Theory 
in  Research,  " McGraw-Hill  Book  Co.,  Inc.,  1952. 
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It  will  become  evident  in  the  next  section  that  the  values, 

R , of  the  elements  of  the  reciprocal  matrix  are  also  required  for 
pq 

the  evaluation  of  the  standard  errors  of  the  parameters,  and  con- 
sequently their  determination  serves  a double  purpose. 


4.2 


The  Adjusted  Values  X'., 


When  the  least  squares  estimates  of  the  u parameters,  a , 
have  been  determined,  the  following  formula  may  be  used  to  ^ 
determine  the  k’  coordinates  of  the  adjusted  values,  X’..,  (j  =1  to  k’) 
for  each  of  the  n points  (i  = 1 to  n): 


= - w.  F,  s.. 

1 i 


(4.18) 


Since  cr..  = 0 for  the  non-random  variables. 


XI.  = X.,  for  j = k + ltok*. 
Ji  Ji 


The  above  general  equation  for  the  adjusted  values  was  derived  by  the 
same  general  procedure  used  in  deriving  (3.4)  and  (3.5). 


2 

We  will  now  introduce  Hotelling's  invariant  form  for  the 
multivariate  case  and  thus  establish  the  exact  equivalence  of  our 
general  least  squares  solution  to  the  geometrical  problem  of  finding 
the  parameters  of  the  particular  functional  relationship  whose  curve 
in  the  k'  dimensional  space  is  tangent  to  the  n hyper-ellipsoids  defined 
by  the  T?  associated  with  the  n observational  points  and  for  which  the 
sum  [T|  ] has  its  minimum  value  [Gf  ] = S(ai , . . . . , a , . . . . , a ). 

Let  Lji^  = 1 Sj^  Sj^^  j denote  the  value  of  the  determinant  of  the  observed 
moment  matrix  describing  the  i^h  data  point,  and  let  denote  the 

corresponding  elements  of  the  image,  or  reciprocal,  d'^terminant. 

Note  that  the  elements  of  the  determinant  L^^  are  determined  by  the 
m^  observations  taken  under  the  fixed  experimental  conditions  cor- 
responding to  the  i^^  point: 


r 


jhi  ®ji  ®hi 


1 

mi(mi  - 1) 


^ (Xjit  - - Xj^.) 

t = i 


(4.19) 


- 4.9  - 


We  may  now  express  Hotelling's  invariant  form  T.  in  the  following 
form: 


k k 
j = l h=l 


(4.  20) 


Each  value  of  T.  defines  a hyper -ellipsoid  in  the  k dimensional  space 
with  its  center  at  the  location  X..(j  = 1 to  k)  of  the  i*”  observation 
point.  Regardless  of  the  form  of  the  statistical  distribution  of  the 
observational  data,  all  points  X*(j  = 1 to  k)  on  the  surfaces  of  these 
hyper -ellipsoids  have  the  same  weighted  squared  deviation  from  the 
mean  observed  values,  X...  For  data  from  normally  distributed 
populations,  the  exact  sampling  distribution  of  Tf  is  known,  and  in 

this  case  the  surfaces  of  these  hyper -ellipsoids  can  be  characterized 
as  surfaces  of  equi -probability.  Thus  the  quantity  T?  (m.  -k)/k(m.  -1) 
is  distributed  as  the  Fisher-Snedecor  variance  ratio  F(k,  m.  - k). 

If  we  set  T.  (m.  - k)/k(m.  - 1)  = F(k,  m.  - k,  p.)  we  may  construct 
hyper -ellipsoidal  confidence  regions  Xr.(j  = 1 to  k)  centered  on  the 
observed  mean  X..(j  = 1 to  k)  which  are  expected  to  contain  the  popula- 
tion mean  X..  (j  = 1 to  k)  with  a confidence  (1  - p.). 

JIO 

If  we  substitute  the  adjusted  values  X'.(j  = 1 to  k)  as  given  by 

(4.18)  for  the  X*  in  (4.  20)  we  obtain  an  expression  for  the  particular 
2 

value  G.  corresponding  to  the  particular  hyper -ellipsoid  centered  on 
■fVi  ^ 

the  i^^  point  which  is  just  tangent  to  the  least  squares  fitted  functional 
relationship  at  the  adjusted  location  of  the  i^^  point: 


2 2 
G.  = w.  F. 
1 11 


(4.21) 


The  reader  should  make  the  above  substitution,  carry  out  the  neces- 
sary algebraic  manipulations,  and  thus  satisfy  himself  as  to  the 
generality  of  (4.21),  Since  these  hyper -ellipsoids  converge  on  the 
true  location  of  the  i^^  point  as  m.  approaches  infinity,  we  see  that  our 
least  squares  solution  yields  consistent  estimates  of  the  adjusted  values 
and,  since  the  adjusted  values  lie  on  the  fitted  function,  consistent 
estimates  of  its  parameters.  The  above  geometrical  argument  evidently 
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cannot  be  used  to  show  that  consistent  estimates  are  obtained  when 
the  m.  are  finite  and  n approaches  infinity.  In  this  latter  case  the 
hyper -ellipsoids  will  each  have  different  finite  magnitudes  and, 
although  it  seems  plausible,  it  is  not  intuitively  clear  that  consistent 
estimates  of  the  parameters  of  the  functional  relationship  will  neces- 
sarily be  obtained  as  the  number  n of  these  finite  hyper -ellipsoids 
approaches  infinity.  The  proof  of  this  latter  consistency  property 
of  our  solution  is  not  available,  but  it  is  intuitively  clear  that  the  n 
points  must  be  more  or  less  uniformly  distributed  over  an  adequate 
range  of  the  variates  if  consistent  estimates  of  the  parameters  are  to 
be  obtained  as  n approaches  infinity. 
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5.  ESTIMATES  OF  THE  EXPECTED  VARIANCES  OF  THE 
ESTIMATED  PARAMETERS  AND  OF  THE  ESTIMATED 
FUNCTIONAL  RELATIONSHIP 


One  of  the  major  advantages  of  least  squares  over  some  other 
methods  of  fitting  data  to  an  assumed  functional  relationship  is  the 
possibility  of  determining  standard  errors  for  the  estimated  param- 
eters as  well  as  standard  errors  for  the  functional  relationship  itself. 
The  theory  of  all  this  goes  back  to  Gauss  vho  showed  how  the  standard 
errors  of  the  parameters  could  be  determined  directly  from  the 
reciprocal  matrix  of  the  coefficients  of  the  parzimeters  in  the  normal 
equations.  Good  discussions  of  this  theory  are  given  by  Whittaker  and 
Robinson — ' and  by  Chauvenet.  — ‘ The  modifications  to  this  theory 
which  are  necessary  when  the  generalized  normal  equations  are  used 
will  be  given  below.  We  will  see  that  the  basis  for  the  theory  is  that 
the  normal  equations  provide  a set  of  linear  relations  between  the 
errors  F.  and  the  estimates  of  the  parameters.  In  the  particular  case 
of  fitting  data  to  a linear  relationship  in  any  number  of  dimensions 
but  with  the  values  of  only  one  of  the  variables  subject  to  error,  the 
estimates  of  the  parameters  and  the  errors  F.  of  the  function  are 

24/  ^ 

linearly  related,  and  Neyman  and  David — 'have  presented  a proof 
that  Gauss'  method  of  least  squares  provides  the  best  linear  unbiased 
estimates  of  the  parameters  and  their  standard  errors.  We  have  seen 
above,  however,  that  the  estimates  of  the  parameters  are  not  exactly 
linearly  related  to  the  errors  F^  in  many  other  cases,  and  we  conclude 

that  the  Gauss  method  will  give  only  approximate  results  in  these  other 
cases.  An  improvement  in  accuracy  is  gained  by  using  the  generalized 
rather  than  the  usual  normal  equations  in  accordance  with  the  method 
outlined  below,  but  there  appears  to  be  no  simple  way  to  obtain  com- 
pletely unbiased  estimates  of  the  standard  errors  of  the  parameters 
in  the  general  case. 


22/ 

— E.  T.  Whittaker  and  G.  Robinson,  "The  Calculus  of 
Observations,  " Blackie  and  Son,  Limited,  London,  1924,  pp.  226-259. 

23/ 

— William  Chauvenet,  "Manual  of  Spherical  and  Practical 
Astronomy,  " J.  B.  Lippincott  Company,  Philadelphia,  1863,  vol.  II, 
pp.  469-566. 

24/ 

— J.  Neyman  and  F.  N.  David,  "Extension  of  the  Markoff 
Theorem  on  Least  Squares,  " Statist.  Research  Mem.,  2(1938),  p.  105. 
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The  solution  (4,17)  of  the  generalized  normal  equations  may 
be  written: 


(a  - a ) = R 
po  p pi 


r 1 

r 

G.  ^ 

L 1 _ 

+ . . , 

+ R 1 G. 
pu  < 1 9a 

^ u 

(5.1) 


Define  the  new  variable  n . : 

pi 


o 


4 . 


+ R 
pu 


(5.2) 


Using  this  variable,  (5,1)  may  now  be  expressed: 


a 

P 


a 

po 


- h .G.] 

pi  1 o 


(5,3) 


Thus  we  have  expressed  a as  a linear  function  of  the  normalized 

P 

deviations,  G = (F./s.).  In  accordance  with  the  rules  for  the 

propagation  of  variance,  we  obtain  the  following  approximate  ex- 
pression for  the  square  of  the  standard  error  of  a : 


2 

s 

ap 


(5.4) 


A pooled  estimate  of  s , independent  of  the  within-group  estimate 
2 G 

of  variance,  s.  , may  be  obtained  from  the  following: 
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= — — [G^]  = S/(n  - u) 
G n - u 1 


(5.5)^ 


The  above  between-groups  estimate  of  variance  was  obtained  by- 
increasing  the  mean  square  value  of  the  n expected  deviations,  G., 

from  the  fitted  functional  relationship  by  the  factor  n/(n  - u)  to 
allow  for  the  bias  resulting  from  the  fact  that  u parameters  were 
determined  in  fitting  the  n points  to  the  function.  A discussion  of 
the  distribution  of  s^  is  presented  in  Section  7.  Note  that  G.  has 
been  normalized  so  that  its  expected  variance  is  approximately  the 
same  for  all  values  of  i even  in  those  cases  where  the  observed 

variances,  s.  , vary  from  point  to  point  by  more  than  would  be  ex- 

^ Z 

pected  for  sample  varicinces  from  the  same  population.  Since  s 

2 ^ 
is  a constant,  independent  of  i,  s^.  may  be  replaced  by  S/(n  - u) 

Gi 

in  (5.4)  and  removed  from  under  the  summation  sign;  we  then  obtain 

the  following  estimate  of  the  variance  of  a : 

P 


+ R 
pu 


(5.6) 

o 


Remembering  that  MR  is  the  cofactor  of  A in  the  determinant 

pq  pq 

M,  (5.  6)  may  be  expressed: 


In  the  special  case  where  the  number  of  points  equals  the 
number  of  unknown  parameters  (n  = u)  the  function  F can  usually 
be  fitted  exactly  to  these  n points  so  that  F.  = 0 (i  = 1 to  u);  thus 

2 T ^2 

[G.  ] = 0 and  the  between-groups  estimate  of  variance  s = 0 even 

X 2 

though  s^  0 for  all  n points.  In  this  case  the  confidence  intervals 
for  the  parameters  and  the  confidence  region  for  the  fitted  function 
must  be  determined  from  the  within-groups  estimate  of  the  variance. 
An  example  of  this  use  of  the  within-groups  estimate  of  variance  is 
given  in  Section  12, 
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(5.7) 


I I til 

that  is,  the  determinant  jA^^j  | with  the  p row  replaced  by 


r 9G.  1 

,v  aaj  , 

LV  8a  J 
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When  (5.2)  is  substituted  in  (5.  7)  we  obtain: 

Ml ^lu 


Ms^p  (n-u)/S  = R 


pi 


r9G.  9G. 

8 d a.^  _ 


^ul 


*8G-  8G. 
1 1 


3ai  8a^j 
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uu 


+ R 
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(5.8) 
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Consider  first  the  case  where  the  second  term  in (4.7)  is  zero  for  all 
combinations  of  pq.  In  this  case  each  of  the  determinants  in  (5.8) 
except  the  one  multiplied  by  Rpp»  is  equal  to  zero  since  two  rows 
are  then  identical.  The  determinant  multiplied  by  Rpp  is  just  equal 
to  M,  and  we  obtain  for  this  special  case: 


(5.9) 


The  above  elegant  solution  for  the  estimated  variance  of  the 

estimated  parameter  a (p  = 1 to  u)  was  first  obtained  by  Gauss. 

P 

The  corresponding  result  using  the  generalized  normal  equations 
may  be  obtained  by  substituting  the  following  (from  4.  7)  into  (5.  8); 


.2  = 
ap 


L 9a^ 


3G. 


n - u 


1 

^i 

= A 

9" 

n 

Gil 

^q 

^ 9a_ 

9a^ 

^q  J 

o 

L p 

qJ 

u u 

— 

Z R I 

R 

G: 

1 pv  , pw 

V = 1 ^ w=l 

1 

8^G..  n 


w 


(5.10) 


In  similar  fashion  we  may  determine  the  covariance: 


^pq  ®ap  ®aq 


s r, 

n - u 


v”iV|„  V 


8^G.  - 

1 


^ 9a  9a 

'^«'V  ''“’WJ  o 


(5.11) 


The  above  may  also  be  expressed  after  considerable  manipula- 
tion as  follows : 


u 

S R. 


R. 


r 9G.  9G.  n 

' 1 1 


pq  ap  aq  n - u ^ ^ ^ Pv  qw 


9 a, 


9 a 


w o 


(5.12) 


Equations  (5.11)  and  (5.12)  may  be  used  for  calculating  the 
variance  (5.10)  by  noting  that  r =1  when  p = q. 

pq 

In  numerical  work  it  is  desirable  to  use  both  (5.11)  and  (5.12) 
for  evaluating  these  variances  since  this  provides  a very  valuable 
check  on  the  calculations. 
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. ^ 

Finally  a between -groups  estimate,  s , of  the  variance 

£ 

of  the  function  F,  corresponding  to  a given  point,  X^.(j  = 1 to  k), 
on  the  fitted  relation  with  its  parameters  determined  by  least  squares 
from  the  n observations,  may  be  determined  by  applying  the  usual 
rules  for  the  propagation  of  variance  Z/  to  the  function. 


F(X 


1* 


X., 

J 


1’ 


2 u 

s„  = S 2 r s s 
F p = l q = l pq  ap  aq 


3F 

9a 


9F 

9a 


(5.13) 


In  the  above  r =1  when  p = q.  The  approximation  in  the  above, 

pq 

aside  from  the  fact  that  r s s is  necessarily  the  sample  rather 

pq  ap  aq 

than  the  population  value,  arises  from  the  fact  that  only  the  linear 
terms  in  the  Taylor's  series  expansion  of  F were  retained  in  its 

9F  \ / 9F 

derivation.  In  evaluating  the  derivatives  ^ J and  ^ 'q^' J 

^ o ^ o 

important  to  notice  that  the  same  estimated  values  of  the  parameters 
are  to  be  used  as  were  used  in  calculating  R . It  is  then  possible  to 

■ pq 

show  that  the  right  hand  member  of  (5.13)  is  inherently  greater  than 
or  equal  to  zero.  The  iterative  process  of  using  successively  better 
estimated  values  of  the  parameters  should  be  continued  until  two 
successive  determinations  of  s^  do  not  differ  appreciably.  Com- 
paring  (5.9)  with  (5.10),  we  see  that  the  correction  to  Gauss*  solution 
will  vanish  in  the  limit  as  the  deviations,  Gj^,  approach  zero;  thus  we 
see  that  (5.11)  or  (5.12)  provide  the  first  order  correction  to  allow  for 
the  finite  sizes  of  the  deviations  which,  in  Gauss'  solution  were 

p 

assumed  to  be  negligibly  small.  The  estimate  of  variance  s^  given 
by  (5.13)  represents  an  estimate  of  the  square  of  the  standard  error 
of  the  function  determined  by  our  sample  of  n sets  of  observations 
relative  to  a hypothetical  "true"  value  for  the  function  which  one  would 
expect  to  determine  if  all  m.(  i=  1 to  n)  were  allowed  to  approach 
infinity. 


■'*^'..*if.,.  I .1  ^-  . •^-  .'^r.  *1.rv^r  J«%  T . '•!  J 

I't  4/>*r  »i  "U-Sr^-,  .'*Xj?f*  - ',*?  ‘ •, 


W • ..  't^ 

f* 

^ * - 

^..■4  • » j 

u*  ■' 

' - *■  A 

'■:  - ■ 

- ^ 4: 

• '-.i^  «»  . 

r 

,‘/i 

■,'  T - 

! •«  ' ■»'.,•• 

.trxitfK 

4ja*?HX»  > V'-^  ro  Y’*''-  >isari'b4%rt:i/^^o  5 3rf^?5tX^^l- 

.■^--'  ■■■.^;W^-- ^ ‘ ^ Jip,  rj#:>ffiv, • '.‘“,otq  sif:  s.?t'  »js^x^j 


Tv 


, 7T>Jfl 


v/f*  4^’ ,.#*  5^  ...  . *::  J 

■-’•»><• . • . w:'#**’  ^ ■ w • ’•  »••• 


. , „ ^7. 


>r"  .^'*i,> 

- ' p-\ 


. ■',:'f.-r.  a;  . :•  :.• . 

, h ■.  w;  - r>! 

'«*  o'T 

''.r,na»i%^tMJ  \ * a # 

^ , . i '-  ' ■ 

' . adj  c".'  y v'*  ' ’ r’  T ' fch 


i-'^'^  i-^l  ’4-?-*  ;’^.*'1fJ 


sli  '?i  .'Jivw"'*  '■■  'I  ,^«i, , tft-iwr.s  1^..* 

■r  ; 'U.*  v; 


p{  li  ..  ‘'^r  v>T«;  : -^ -- 


■ I ■«.. 


- ■ '.  ■ ;V.iX’ 
Ci,i  ' 


.13  .'iJi 


_■ 


, 

•** 


! •jr*’ 


•■s'* 


f ■ , ' ■ *■  »• 


1.- 


I 'l 


^NU-  :'J  v*^;r#‘V'v  t . ■ii'aa*wiB  •■:■’ 'f»'  t tt»ti  #4Pljw»j"  :'<s*  fiUf' 

T|i^  riO-l  It  <i  I ' SH-^'l>'.'  A 

•,/J r .V-'*  ^v-jar  :aMp?  « SOI  ' 

■'  f-i  '-  .:■  f»W  '■'  1^  'IMJ 

r%*4r  . '■  •■?•'  ’*  ■ ' M-i  V 

•'  I'v  tn  'i.<  ')  1 

IT  - - . ^ d S r|f  ' :•  -'^.A^J.L  iAr  (T.r 

,.  % ^o;t^v-;5  u 4 * '•  .i"'ta  ^3 i.a>«  "niu’i 

-t*  "3.^1’A\Tolli5'#i.jW..4rt'' <Sfllite  H’  > ox 

b :■•  4-.in  ■■ 


•i- 


f ’ 'f  •'  • i «-♦>#'  i»t4>Nr7.i^,^ ' w r ’ > * *.u4w  *>|j 

‘ 'J  .•«  i'  1 

'■  , , ',  » .^ 


.,•  7 rfe 


'|L  ''H*:'  if-*  ■ ^■■  ,^J 

■ ■ ■ ..  V ,'  ' id' 

*.•  V t ■*  ' ' ■ '■•  ''  ^ 

■'‘i  ..',  t‘  . ."J^^  '.iBiIi'm-j® 


- 6.1  - 


6.  SYSTEMATIC  METHOD  FOR  THE  NUMERICAL  SOLUTION 
OF  THE  GENERALIZED  NORMAL  EQUATIONS 

The  method  described  in  this  section  follows  closely  that 
proposed  by  Doming  in  Chapter  IX  of  reference  7,  and  the  student 
is  referred  to  Ms  book  for  proofs  of  the  statements  made  in  tMs 
section.  The  method  will  be  discussed  in  detail  for  the  case  of  fit- 
ting a straight  line  involving  2 unknown  parameters.  An  explicit 
form  for  determining  3 unknown  parameters  is  then  given,  and  it 
will  be  clear  from  these  examples  how  the  method  may  be  extended 
to  more  than  3 unknown  parameters.  Incidental  to  the  determination 
of  the  u parameters,  tMs  method  also  provides  u + 1 different  values 
of  the  sum  S of  the  weighted  squares  of  the  deviations  from  the  func- 
tion: thus  S(a  ) is  the  first  estimate  of  S determined  by  using  the 

estimated  values  a , and  then  u successively  smaller  values  of  S 

po 

are  obtained  as  the  u least  squares  values  of  a are  determined 

P 

successively  and  used  in  the  calculations.  Furthermore,  the  method 
also  yields  estimates  for  the  variances  and  covariances  of  the  u 
estimates  of  the  parameters. 

6.1  Fitting  a Straight  Line 

First  it  is  necessary  to  determine  approximate  values  a and 

b^  for  the  parameters.  TMs  may  be  done  in  a variety  of  ways,  but 

plotting  the  data  and  fitting  a line  by  eye  is  usually  the  simplest.  In 

some  cases  these  estimates  need  not  be  very  accurate  and,  in  fact, 

it  will  sometimes  be  convement  simply  to  let  a = b =0;  however, 

o o 

if  the  relative  weights  depend  on  the  value  of  b,  it  will  be  desirable 

to  use  a good  estimate  of  b at  the  outset.  The  estimated  values  a 

0 o 
and  b^  are  entered  at  the  top  of  the  tabulation  form  given  on  the  next 

page.  Next,  the  weights  w. (i  = 1 to  n)  are  calculated  and  used  to 

1 

calculate  the  5 sums  A^^,  A^^»  q» 

page  4.  6 together  with  S(a  , b ) = [w.  F .]  . These  six  values  are  then 

00110 

entered  on  the  tabulation  form.  The  Arabic  numerals  in  parentheses 
(1),  (2),  ....  (35)  indicate  the  preferred  order  of  calculation  and  entry 
on  the  tabulation  form  of  the  35  numbers  required  for  a complete 
solution.  Row  4 is  now  obtained  by  multiplying  the  values  in  Row  I 

i^em  (8)  is  (-A^^/A^^)*  Aj^2 


The  Iterative  Solution  of  the  Generalized  Normal  Equations  for  Fitting  a Straight  Line 
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and  item  (9)  is  (“-^2  1^*  *^0*  items  (10,  (11),  and  (12)  are 

obtained  by  adding  the  numbers  immediately  above  in  Rows  2 and  4, 
Item  (13)  is  item  (14)  is  (-A^^/A^  ^)*  Item  (15) 

is  (”^2o'^^22^  item  (16)  is  (“®20^^22^*^20* 

obtained  by  adding  the  numbers  immediately  above  it  in  Rows  3,  6, 

and  7,  i.  e. , S(a,  b)  = S(a  , b ) + (14)  + (16).  Note  that  (14)  and  (16) 

o o 

are  necessarily  negative,  and  (14)  represents  the  reduction  of  S 
arising  from  the  change  of  a to  a while  (16)  represents  the  additional 
reduction  of  S arising  from  the  change  of  b^  to  b.  Next,  item  (18) 
is  determined  by  considering  Row  II  to  be  the  following  equation  for 
(b  - b^): 


= ®Z0 


(6.1) 


Thus  (b  - b ) = {item  (11) /item  (10)},  and  this  is  item  (18).  Next, 
o 

using  this  value  of  (b  - b^).  Row  I may  now  be  solved  for  (a  - a^): 


A^^(a  - a^)  + A^^(h  - bj  = 


(6.Z) 


At  this  point  in  the  calculations,  a decision  should  be  made 
regarding  the  adequacy  of  the  original  estimates  of  a^  and  b^;  thus 

we  should  determine  the  new  estimates  a + (a  - a ) and  b + (b  - b ) 

o 00  o 

of  a and  b,  respectively,  and  repeat  the  entire  procedure  outlined 
above  using  these  new  estimates,  the  revised  results  being  entered 
on  a new  tabulation  form.  In  many  cases  the  second  set  of  calcula- 
tions will  lead  to  values  of  (a  - a ) and  (b  - b ) equal  to  zero:  if  not, 

0 0 

the  above  procedure  should  be  repeated  until  the  calculated  values 

of  (a  - a ) and  (b  - b ) are  sufficiently  near  to  zero. 

00 


We  may  now  proceed  with  the  calculations  of  items  (20) 
through  (35).  Thus  R^^  and  R^^  are  determined  by  considering  that 

they  are  the  values  of  (b  - b^)  and  (a  - 3,^)t  respectively,  which  would 

be  obtained  by  replacing  column  C by  column  C ; thus  R = (iteml2/B  } 

O X M J.  W U 

and  = {(1  “ ^21^2^^^!^*  Similarly,  R^^  3-nd  R^^  determined 
by  replacing  column  C by  column  C ; thus  R = 1/B  and 

and  R^^  = - ^22'^2^‘^1*  ^2  ~ ^21  obtain  a check  on  our 
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calculations  at  this  point;  thus  item  (20)  must  be  equal  to  item  (2  3). 

Furthermore,  we  may  now  check  (a  - a ) and  (b  - b ) and  enter  the 

o o 

results  as  (24)  and  (25)  by  using  the  equations  (4.17): 


(a  - a^)  ^2'^20 

(6.  3) 

(b  - b^)  = R2A0  ^22"^20 

(6.4) 

We  have  already  determined  that  (a  - a ) and  (b  - b ) are  near  zero, 

o o 

and  the  above  equations  simply  indicate  that  this  will  be  true  only  to 
the  extent  that  and  are  near  zero.  Items  (26)  and  (27) 

represent  our  final  estimates  of  a and  b. 

Item  (28)  is  the  determinant  M defined  by  (4.16): 


^ ^1*^22 


(6.  5) 


When  M is  small  it  is  indicative  of  a condition  of  near  indeterminacy 
in  the  solution  for  the  parameters;  this  possible  difficulty  is  discussed 
by  DemingZ/  who  also  gives  other  references. 


2 2 

Item  (29)  is  the  estimate  s ; if  s is  much  larger  than  unity, 

G G 

it  will  be  desirable  to  use  the  tests  described  in  Section  7 for  de- 
termining the  statistical  plausibility  of  the  solution.  A very  large  or 

2 

very  small  value  of  s may,  of  course,  simply  indicate  numerical 

G 

errors  in  the  calculations  in  some  cases. 


Items  (30),  (31),  and  (32)  are  obtained  by  using  (5.10): 


®G  1^1  ■ ^^11^2  r^i  8a9b  ‘ ^2  1 °i  „2 


o 


8b  ->o 


(6.  6) 


2 2 r 

% " ®G  1^22  " ^^22 


^2 


r ' ^i  1 2 

^i  8a8b  " ^22 

L.  -I 


r 9 G. 

G.  ^ 

^ 8b^  o 


(6.7) 
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^^ab®a®b  ®g1^2  ■ ^^1^22  ^2* 


9 G.  -, 
G.  ^ 


i 8a  9b 


-J  o 


-^2 


R 


22 


d^G. 

, 1 

2 

8b  -*o 


I (6.8) 

I We  see  by  (4.  7)  that  the  terms  involving  G.  in  the  above  expression 

are  simply  components  of  A and  A and,  by  appropriately  arranging 

Xu  u u 

I the  calculations,  can  be  obtained  during  the  course  of  the  evaluation 
of  Aj^2  “^22’  these  terms  may  be  expressed  as  follows: 


a^G. 
, 1 

'i  8a  8b 


-J  o 


= fw^  F(bs^  -rs  s )] 
i i'  Xi  i Xi  Yi'-*o 


(6.9) 


a G. 


i 2 
8b  -*o 


= [2  w?  F.  X.  (b  s^.  - r . s s ) 
111  Xi  1 Xi  Yi 


. _ 3 _2  2 .2  2 _2  2 , , 

+ 3w.  F.  (b  s„.  - r.  s„.  s__.)  - w.  F.  s ] (6.10) 

1 1 Xi  1 Xi  Yi  1 1 Xi-^o 


As  a valuable  numerical  check  on  the  above  calculations, 
items  (33),  (34),  and  (35)  may  next  be  calculated  using  the  following 
expressions: 

_ _ , _ r/8G. -.2.  p8G.  8G.-.  _ r.  aG..2 

®a  ""  V ”9r"ab“  ■'■^2  ( "ab")  ^ 

(6.11) 


2 2 6 2 1 


^ Jo 


22 


r8G.  8G.- 
- 1 1 

8a  8b 


+ R 


2 


-*  o 


22  LV  9b  y J 


(6.12) 


r , s s = s^ 
ab  a b G 


1 ^2  [ (^ 'aa'y*  _ ■'■^^1^22'’^2' 


9G.  8G. 


8a  8b 


r.aG.v  2 


■'■^2^22i  (_ab  ) 


(6.13) 
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The  terms  involving  G.  in  the  above  are  the  remaining  components  of 

^2'  = 


(6.14) 


8G. 

1 

8a 


9G. 

1 

8b 


-*  o 


= [w.  X,  + w.  F.(b  s 
1 i 1 1 Xi 


^i®Xi®Yx>lo 


(6.15) 


8G.V  2 


8b 


o 


= [w.  xf  + 2wf  F.  X.(bs  - r.  s _ 
11  111  Xi  iXiYi 


s__)  + F^(bs 


Xi“^i®Xi®Yi^  ^o 


(6.16) 

This  completes  the  general  description  of  the  calculations  required 
for  the  tabulation  forms. 


In  most  cases  the  procedure  up  to  item  (19)  described  above 
need  not  be  carried  out  more  than  two  times.  However,  the  following 
example  indicates  how  slowly  the  iterative  process  may  converge  in 
an  extreme  case.  Thus,  for  the  linear  example  of  Table  2.1  with 
r . = 0.9  and  the  variances  not  pooled,  six  repetitions  of  the  above- 

described  iterative  process  were  required  before  (a  - a^)  and  (b  - b^) 

were  considered  negligible.  Table  6.1  lists  the  results  of  calculations 

on  the  form  using  successively  better  estimates  a and  b . The 

o o 

initial  value  of  b^  was  arbitrarily  set  equal  to  1 and  a corresponding 

value  of  a^  determined  by  setting  = 0;  in  subsequent  steps,  however, 

the  previous  estimates  of  both  a and  b were  used.  In  the  fourth 

o o 

column,  the  values  of  and  b^  were  taiken  to  be  approximately  the 
averages  of  the  two  previous  estimates.  Note  that  only  the  values  in 
the  last  column  represent  the  least  squares  solution,  and  then  only 
to  the  extent  that  they  may  be  considered  to  be  calculated  with  suffi- 
cient accuracy.  In  the  present  problem,  since  the  standard  errors 
of  the  parameters —and  thus  the  function— were  found  to  be  so  large,  it 
would  have  been  satisfactory  to  use  column  4 as  a final  solution.  The 
process  was  carried  further  to  show  how  it  converged.  Note  that  the 
calculations  in  the  first  five  columns  in  Table  6.1  need  only  have  been 
carried  to  the  point  at  which  the  new  estimates  of  a and  b became 
available;  the  other  calculations  shown  on  Table  6.1  are  included  simply 
to  indicate  the  behaviors  of  these  items  in  the  iterative  process. 
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Table  6.1 


The  Iterative  Solution  of  the  Normal  Equations  for  the 
Linear  Example  of  Table  t.  = +0.9,  Variances  Not  Pooled 


a 

o 

b 

o 


-*  o 


20 


L.\  dlo^  -* 

r_  ® °i  ] 


G. 


3b  o 


2 

= tG. 

S(a,  b) 
a 


b 

M 


^1 

^2 


r 


ab 


s 

a 


s 


b 


1 

2 

3 

4 

5 

6 

-0. 596522 

-0. 182254 

0.911686 

0.  869 

0. 883217 

0. 893603 

1. 000000 

0. 865564 

0. 582203 

0.614 

0. 629517 

0. 626848 

2. 200627 

1. 832055 

1. 166980 

1.228491 

1. 259757 

1. 254321 

1. 307844 

1. 667553 

1. 378206 

1. 471759 

1.  515409 

1. 508034 

0. 876429 

0. 824808 

0. 511546 

0.  545986 

0. 563365 

0. 560352 

0 

0. 177771 

0. 596571 

0. 278496 

-0. 007691 

-0. 000016 

4. 384900 

4. 324416 

3. 056732 

3. 246236 

3.  338531 

3. 322707 

16. 385947 

18. 167640 

14. 089969 

15. 314429 

15.996677 

15.918974 

-2. 873803 

-2. 100252 

-0. 605595 

-0. 340995 

-0. 122211 

-0. 120688 

13. 512144 

16. 067388 

13.484374 

14. 973434 

15. 874466 

15. 798286 

-1. 683430 

-0. 110643 

3. 049609 

1. 393608 

-0. 054371 

-0. 000033 

87. 637878 

98. 910118 

77. 540889 

86. 262304 

91. 334105 

90. 853235 

33.477844 

-36. 490220 

-11.477707 

-10. 170050 

-9. 182016 

-9. 024872 

54. 160034 

62.419898 

66. 063182 

76. 092254 

82. 152089 

81. 828363 

3.  671059 

3. 714150 

3.255054 

3. 150050 

3. 125100 

3.  125042 

3. 714150 

3.  255054 

3. 150050 

3.  125100 

3. 125042 

-0. 182254 

0. 911686 

0.  826615 

0. 883217 

0. 893603 

0. 893567 

0. 865564 

0. 582203 

0.645728 

0.  629517 

0. 626848 

0. 626854 

54.908298 

11. 768649 

20. 109100 

22. 809688 

22. 268625 

22. 305834 

0.  986372 

5. 303914 

3.285238 

3. 335962 

3. 689141 

3. 668474 

-0. 246086 

-1. 365270 

-0. 670561 

-0. 656451 

-0. 712862 

-0. 708258 

0. 079859 

0.  367452 

0. 152007 

0.  142318 

0. 149921 

0.  148961 

5. 941998 

159. 3493 

18. 807762 

19. 609167 

24. 102001 

23. 651773 

0. 662294 

11. 83746 

0.956191 

0.  896478 

1. 031835 

1. 011737 

-1. 848725 

-43. 319318 

-4. 120100 

-4. 087972 

-4. 788318 

-4. 793551 

6.8 


- 6.9  - 


The  page  following  Table  6.1  contains  the  tabulation  form  with 

the  values  of  the  35  items  entered  for  this  same  problem  beginning 

with  the  final  estimates  a and  b at  the  top  of  Column  6 in  Table  6.1. 

o o 

The  tabulation  form  for  fitting  a straight  line  may  also  be  used 
for  the  calculations  required  in  fitting  an  arbitrary  function  involving 
any  two  unknown  parameters  provided  -^2*  ^ 

defined  by  (4.  7)  and  (4.  8). 

6.  2 Fitting  an  Arbitrary  Function  with  3 or  More 
Unknown  Parameters 

In  the  case  of  3 unknown  parameters,  the  tabulation  form  on 
the  following  page  may  be  used.  The  method  of  calculation  of  the 
various  entries  on  this  form  should  be  clear  from  the  discussion  in 
the  preceding  subsection  6.1.  The  extension  of  the  tabulation  forms 
and  methods  of  calculation  to  cover  the  case  of  more  than  3 unknown 
parameters  should  be  clear  from  the  above -described  tabulation  forms 
for  2 and  3 unknown  parameters. 


.and 

22’/  20 


are 
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The  Iterative  Solution  of  the  Generalized  Normal  Equations  for  Determining  u = 3 Parameters  and  Their  Variances 


^10  - ^20  - ^30  ■ 


Row 

*^1  ■ ^10^  *^2  ■ ^20*  *^3  ■ ^30*  " ^o 

Cl 

C2 

C3 

I 

(1)  Ajj 

(2)  Aj2 

(4)  Aj3 

(7)  Ajo 

1 

0 

0 

2 

(3)  A22 

(5)  A23 

(8)  ^20 

0 

1 

0 

3 

(8)  A33 

(9)  A30 

0 

0 

1 

4 

How  Obtained 

(10)  Kf?] 

‘•1  1 ■'o 

5 

A,, 

- -ii  • (I) 
An 

(12) 

(13) 

(14) 

(11) 

0 

0 

II 

2 + 5 

(15)  B22 

(16)  833 

®20 

(18) 

1 

0 

7 

Ai  ^ 

- —•  (I) 
All 

(20) 

(21) 

(19) 

0 

0 

8 

B?3 

(II) 

^^22 

(23) 

(24) 

(25) 

(22) 

0 

m 

3 + 7+8 

(26)  C33 

(27)  C30 

(28) 

(29) 

1 

10 

- ^ . (I) 
All 

(31) 

(30) 

0 

0 

11 

(H) 

®22 

(33) 

(32) 

0 

12 

- ^ . (Ill) 

C33 

(35) 

(34) 

IV 

4 + 10  + 11  + 12 

(36)  S(a3,  a2>  a3) 

14 

I solved  for  (a^  - a^Q) 

(39)  (a^  “ a^Q) 

(49)  (ai  - aio) 

(42)  Rii 

(45)  R^2 

(48)  Rj3 

15 

II  solved  for  {a.^  - ^2.0^ 

(38)  (a^  - a2Q) 

(50)  (a2  - a2o) 

(41)  R21 

(44)  R23 

(47)  R^3 

16 

III  solved  for  (a3  - a3Q) 

(37)  (a3  - ajp) 

(51)  (aj  - ajp) 

(40)  Rjj 

(43)  R32 

(46)  R33 

17 

M = All  - ®22  ■ *"33 

(55)  M 

(52)  ai 

(57) 

al 

18 

_2  _ S(ai-  a2.  a3> 
(n  - 3) 

(56)  2 

SG 

(53)  a2 

(58)ri2  8^38^2 

(60)  ,2^ 

19 

(54)  aj 

(^^)  ®al®a3 

(^I>’^23"a2«a3 

(62)  3|3 

20 

(63)  2 

“al 

21 

(^^)  ’^12®al«a2 

(^6>  sl2 

22 

(^5)  "^IS  \l"a3 

(^^>  ’^23®a2  ®a3 

(68)  2 
®a3 
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7.  THE  DISTRIBUTION  OF  S 


Throughout  this  section  it  will  be  assumed  that  each  of  the 


25/ 


observed  random  variables  are  from  normally  distributed  populations, — 
and  it  will  be  shown,  on  this  assumption,  that  S/v^  (where  S = ] 

is  the  minimized  sum  defined  in  (4.1))  is  sometimes  exactly  but  is 
always  approximately^  distributed  as  Fisher’s  variance  ratio  ^2) 


with  V and  v degrees  of  freedom: 

X M 


= n - u 


r 2^2 


^2  . 4 


[s^  / (m,  - 1)] 


(7.1) 


(Variances  not  pooled)  (7.2) 


''2  = K 


1] 


(Variances  pooled) 


(7.3) 


Here,  as  before,  u denotes  the  number  of  unknown  parameters  esti- 
mated in  minimizing  S,  and  s^  is  defined  by  (4.  2).  This  distribution 
may  be  used  to  detect  statistically  significant  departures  of  the  ob- 
served data  from  the  statistical  models  assumed  in  deriving  S.  Thus 
if  the  observed  value  of  (S/v  ) is  larger  than  F(v  , v , p)  for  the 

X JL  M 

probability  level  p chosen  in  advance  as  the  minimum  value  consist- 
ent with  accepting  the  model,  we  may  conclude  (a)  that  the  observed 
data  contain  a statistically  significant  component  of  variance  arising 
from  the  presence  of  random  systematic  errors,  (b)  that  the  form  of 
the  function  fitted  to  the  data  is  incorrect,  or  (c)  that  a combination 
of  both  of  the  above  factors  is  responsible  for  the  observed  large 
value  of  S. 


^ The  proof  given  in  subsection  7.  3 for  k > 1 depends  on  the 
the  assumptions  (a)  that  the  have  the  same  values  independent 

of  i for  all  n groups  of  observations  i = 1 to  n and  (b)  that  the  ratios  of 

2/2 

the  population  variances  (r  ../<r,  .have  the  same  values  C.,  independent 

^ ^ T]ji  T]hi  jh 

of  i for  all  n groups  of  observations  i = 1 to  n.  These  assumptions  may 
not  be  necessary,  but  they  are  at  least  sufficient. 


25/ 

Kenneth  A,  Norton  and  Eugene  Barrows,  "The  Kolmogorov 
Test  of  the  Goodness  of  Fit  of  Data  Samples  to  Independently  Specified 
Continuous  Distributions,  Together  with  a Test  of  the  Normality  of  a 
Small  Sample,  " NBS  Report  5070,  July,  1957. 


- 7.  2 - 


When  the  same  population  variances  cannot  be  assumed  for 
all  n groups,  this  approximate  distribution  of  S provides  the  only 
satisfactory  means  presently  available  for  testing  our  models.  It 
should  be  noted  that  the  statistical  tests  for  heterogeneity  of  variance 
will  often  indicate  a common  population  variance  when,  in  fact,  the 
actual  population  variances  differ.  Thus,  in  those  cases  where  the 
experimenter  has  reason  to  suspect  different  population  variances, 
it  will  be  better  to  obtain  the  solution  without  pooling  the  variances 
even  though  the  statistical  tests  for  heterogeneity  of  variance  indicate 
no  statistically  significant  differences.  This  point  is  emphasized  here 
since  there  may  be  a tendency  to  attempt  to  justify  the  simpler  method 
of  pooled  variances  by  statistical  tests  alone,  and  this  will  lead  oc- 
casionally to  an  incorrect  acceptance  or  rejection  of  a proposed  model. 


7. 1 One  Variable  Subject  to  Error  (k  = 1)  and 
the  Variances  Pooled 

We  will  consider  initially  in  this  subsection  the  case  of  one 
variable  (k’  = 1),  and  will  assume  that  n groups  of  observations  are 
used  to  estimate  the  population  mean  value  a which  is  assumed  to  be 
the  same  for  all  groups.  The  least  squares  estimate  for  a may  be 
expressed: 


[w.  Y.] 
11 

a = 

[w.] 

m. 

Y.  = — > Y 

1 m.  it 

1 t = l 


(7.4) 


(7.5) 


w.  = m.  / s . 

1 1 Til 


S = [w.  (Y.  - a)^]  = [w^  {(Y^  - a)  - (a  - a)  }^] 


(7.6) 

(7.7) 


S = - ci)^  - 2w^(Y.  - a)(a  - a)  + w^(a  - a)^] 


S = “ [w.](a  - a) 


(7.8) 
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Assume  now  that  the  variance  o'  . is  known  for  each  group  of  observa- 
tions and  let 


wl  = m.  /cr  . 
1 1 T|1 


(7.9) 


If  we  write  z = n/  w'  (Y.  - a),  i = 1 to  n,  it  follows  that  the  z.  are 
111  1 
independent  random  variables  normally  distributed  about  zero  with 

unit  variance;  now,  if  w.  is  replaced  by  wj,  (7.  8)  and  (7.  4)  may  be 

written 


S = [z.  ] - [w.']  (a 
cr  1 1 


- a)  = I — (Y.  - a) 


Til 


p n/  w.‘  z.  - 

'^  [ wj  ] (a  - a)  = I — -p=== 

1 L si  [w>]  -i 


(7.10) 


(7.11) 


The  subscript  cr  on  S is  used  to  indicate  that  the  weights  w!  are  con- 
sidered to  be  known  constants.  Note  now  that  the  linear  form 

*^[w|  1 (a  - a)  = [c.  z.l  has  coefficients  c.  which  satisfy  the  condition 
1 11  1 + 

[c,  ] = 1.  Thus  we  may  apply  Fisher's  lemma  to  (7.10),  and  conclude 

(a)  that  S is  distributed  exactly  as  x with  (n  - 1)  degrees  of  freedom; 
cr 

or  alternatively  that  S /(n  - 1)  is  distributed  exactly  as  Fisher's 

0"  o 

variance  ratio  F(n  - 1,  oo),  and  (b)  that  (n  - l)[w.'  ](a  - a)  /S  is 
distributed  exactly  as  F(l,  n - 1).  There  are  no  formal  difficulties 
in  extending  these  conclusions  to  problems  involving  u unknown 
parameters  and  more  than  one  variable  provided  the  additional 
variables  are  not  random;  for  example,  this  has  been  done  by 
Cramer  in  Chapter  37  of  reference  18  and  the  principal  change  in- 
volved is  the  replacement  of  (n  - 1)  by  (n  - u).  These  conclusions 
are  not  very  useful,  however,  since  they  depend  on  an  assumed  a priori 
knowledge  of  the  variances  cr^..  However,  if  it  is  reasonable  on  physical 
grounds  (independently  of  the  observed  data)  to  assume  that  the  n groups 
have  a common  population  variance  cr  , and  particularly  if,  in  addition, 

T)  

statistical  tests  for  heterogeneity  of  the  n observed  variances  indicate 


t A good  discussion  of  Fisher's  lemma  is  given  by  Cramer  in 
reference  18,  p.  379.  Another  discussion,  involving  more  elementary 
mathematics,  is  given  in  Section  10.  6,  p.  262,  of  the  book  by  A.  Hald.~^^ 
26/ 

A.  Hald,  "Statistical  Theory  with  Engineering  Applications,  " 
John  Wiley  and  Sons,  1952. 
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that  this  assumption  is  not  statistically  unreasonable,^  we  may  express 
S as  follows: 


/(s^/cr^ 

T1  T1 


(7.12) 


where  [m  - l](s^/or^)  = [(m  - l)(s^  /cr^)]  (7.13) 

1 T|  1 T)1  T| 

2 , 2 

It  is  well  known  that  each  of  the  terms  (m.  - 1)  s . /cr  on  the  right 

2 1 T)i  n 

hand  s’de  of  (7.13)  is  distributed  as  x with  (m^  - 1)  degrees  of  freedom, 

and  it  follows  that  their  sum  will  be  distributed  as  x with  [m.  - 1] 

degrees  of  freedom.  Finally,  since  S/(n  - 1)  may  now  be  expressed 

as  a ratio  of  two  independent  mean  squares  (u/v  )/(v/v  ) where 

2 i ^ 

u = S is  distributed  as  x with  v = (n  - 1)  degrees  of  freedom  and 
1 2 , 2 ^2 

V = [m.  - l](s  /cr  ) is  distributed  as  x with  v = [m.  - 1]  degrees  of 
1 T|  r|  2 1 

freedom,  it  follows  (a)  that  S/(n  - 1)  is  distributed  exactly  as 
F(n  - 1,  [m.  - 1])  and  (b)  that  (n  - l)[m.](a  - a)  /Ss  is  distributed 

1 1 T| 

exactly  as  F(l,  n - 1).  These  last  two  conclusions  may  be  extended 
(See  Cramer,  chapter  37  in  reference  18)  to  problems  involving  u 
parameters  and  k'  variables,  only  one  of  which  is  random,  simply  by 
replacing  a by  ^p>(P  = 1 to  u)  and  (n  - 1)  by  (n  - u).  These  are  the  only 
problems  for  which  the  exact  distribution  of  S is  readily  determinable. 


7.  2 One  Variable  Subject  to  Error  (k  = 1)  and 
the  Variances  not  Pooled 

We  will  consider  next  the  one -variate  problem  for  the  case 
where  it  is  not  reasonable  to  assume  that  the  n groups  of  observations 
are  from  populations  with  the  same  variance,  and  will  make  use  of  a 


t 

Note  that  statistical  tests  on  a particular  observed  sample 
cannot  provide  a sufficient  reason  for  assuming  homogeneity  of  the 
variances,  although  repeated  tests  on  many  samples  might  be  con- 
sidered to  provide  adequate  grounds  for  such  an  assumption;  also, 
statistical  tests  for  homogeneity  of  variances  are  unnecessary  in 
case  valid  physical  reasoning  leads  to  the  assumption  of  a common 
population  variance. 
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27y 


theorem  on  quadratic  forms  derived  in  a recent  paper  by  Box. 

Let  Q be  the  weighted  sum  of  n different  variates  characterized 

by  degrees  of  freedom,  i = 1 to  n: 


Q = 


s . -1 

’ll 

_r 

/ • 

m. 

1 -j 

■[ 

V m.(m.  - 1) 

/ (m.  - 1)  s . 

1 rp. 

2 

a . 

Til 


= I \ x (v.) 


(7.14) 


X.  = (T  . /m.(m.  - 1) 

1 T|1  1 1 


X^(v.)  = (m.  - 1)  s^.  /(T^. 

1 1 T|1  T]1 


(7.15) 

(7.16) 


Note  that  the  X,  are  simply  constants  and  that  v.  = m.  - 1. 

1 ^ ^ 11 


According  to  Theorem  3.1  in  Box^s  paper,  Q/g  will  be  distributed 
approximately  as  x^(^)  where: 


h = [v.X.]^ /[v.xf]  = [(T^./m.]^  /[(T^. /mf(m.  - 1)] 

11  •’ll  T|i  1 '■qi  1 1 


(7.17) 


gh  = [v.  V]  = [ 


O'  ./m.] 

Til  1 


Now  consider  the  ratio  R: 


(7.18) 


R = {S^/(n  - !)}/{Q/gh}  (7.19) 

2 

Since  is  distributed  exactly  as  x with  (n  - 1)  degrees  of 
freedom*  and  Q/g  is  distributed  approximately  as  x with  h degrees  of 
freedom  and  independently  of  S , it  follows  that  R is  distributed 


G.  E.  P.  Box,  ”Some  theorems  on  quadratic  forms  applied 
in  the  study  of  analysis  of  variance  problems,  I.  Elffect  of  inequality 
of  variance  in  the  one-way  classification,  " Annals  of  Mathematical 
Statistics,  Vol.  25,  June,  1954,  pp.  290-302. 
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approximately  as  F(n  - 1,  h).  If  we  now  rej)lace  the  cr  ^ in  (7.17), 
(7.18),  and  (7.19)  by  the  estimated  values  s .,  we  find  that  gh  = Q, 
S SS  , R S S/(n  - 1), 

O' 


= [s^./m.f /[s^./m^{m.  - 1)]  s h 

T]!  1 T^l  1 1 


(7.20) 


and  conclude  that  S/(n  » 1)  is  distributed  app r 03dm at ely  as  F(n  - 1, 
Furthermore,  we  may  use  the  estimates  w.  for  w.’  in  (7.10),  and  thus 

conclude  that  (n  - l)[w.](a  - a)  /S  is  distributed  jipp r oximat ely  as 
F(l,  n - 1)  even  when  the  population  variances  <r  . vary  from  group  to 
group.  These  last  two  conclusions  may  be  extended  to  problems 
involving  u parameters  and  k’  variables,  only  one  of  which  is  random, 
simply  by  replacing  a by  a and^a  by  a (p  = 1 to  u),  and  (n  - 1)  by 
(n  - u).  Since  s . approaches  cr  . as  m.  approaches  infinity,  the  above 

T|1  Tjl  1 2 

two  approximate  distributions  for  S and  for  (a  - a)  become  exact  as 
all  of  the  m.  are  allowed  to  increase  without  limit.  It  is  of  interest 
to  compare  the  above  approximate  solution  for  the  distribution  of  S 
with  the  approximate  solution  obtained  by  Welch for  the  special 
case  n = 2.  For  this  case  (n  1)  = 1,  and  we  may  write: 


S = 


't]1 


m^(Y^  - a) 


'il2 


= (^1  - ^2)  / 


2 2 
S s 

_ni  _t]2 


m. 


(7.21) 


The  second  expression  on  the  right  of  (7.  21)  is  readily  obtained  when 
we  substitute  in  the  middle  member  of  (7.21)  the  following  expression 
for  a: 


a 


“1^1 


V 


>lX 


■ti2 


m. 


/( 


V 


*Tll 


(7.22) 


28  / 

- — B.  L.  Welch,  "The  Generalization  of  Student's  Problem 
When  Several  Different  Population  Variances  Are  Involved,  " 
Biometrika,  vol.  34,  pp.  28-35,  1947. 


2 

s 


t The  reader  should  note  that  the  replacement  of  cr  ^ by 
increases  the  variance  of  the  numerator  and  decreases  the 


Til . 

variance  of  the  denominator  in  (7.19);  the  approximation  depends 
upon  the  fact  that  these  two  effects  are  approximately  compensatory. 
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Welch  concludes, when  (r^  may  differ  from  cr^  , that  S is  distrib- 

1 

uted  approximately  as  F(l,  v ) with  v defined  exactly  as  in  (7.20).  A 

comparison  of  Aspin^s —'tabulated  exact  results  for  this  special 
case  indicates  that  Welch’s  approximation  should  have  adequate 
accuracy  for  most  applications.  Thus  we  see  that  our  approximate 
solution  has  satisfactory  accuracy  in  this  special  case,  and  presume 
that  it  will  also  be  satisfactory  in  the  general  case. 


The  numerator  in  (7.19)  represents  a between -groups 
estimate  of  variance,  while  the  denominator  in  (7,19)  represents 
a within-groups  estimate  of  the  same  variance.  Box^Z.'  obtained  in 
Section  7 of  his  paper  an  expression  for  the  distribution  of  such  a 
variance  ratio  without  the  restriction  that  the  n group  population 
variances  must  be  the  same,  but  his  variance  ratio  was  essentially 

different  from  (7.19);  thus  he  did  not  weight  either  his  Y.  or  his 

2 2 ^ 

(Y.  - a)^  in  inverse  proportion  to  cr^.  Although  his  solution  was 

1 T|i 

appropriate  to  the  simple  analysis  of  variance  problem  he  was  con- 

sidering, our  least  squares  formulation  appears  to  be  in  a more  useful 
form  when  no  assumption  is  made  about  the  population  variances  (t^.. 


7.  3 More  Than  One  Variable  Subject  to 
Error  (k  > 1) 

We  will  show  in  this  subsection  that  the  extension  of  the  above 
results  to  the  general  case  in  which  more  than  one  variable  is  sub- 
ject to  error  involves  only  the  use  of  (4, 1)  for  defining  S and  the 
2 2 

replacement  of  s . /m.  by  s,  as  defined  in  (4.  2);  we  obtain  in  this  way 

Tp.  1 1 

a good  approximation  to  the  distribution  of  S.  However,  since  the 

estimated  values,  a , are  not  linear  functions  of  the  errors  of  the 

P 

k random  variables,  we  can  obtain  only  rough  approximations  to  the 

distributions  of  (a  - a ) in  this  general  case.  In  the  particular  case 

P P 

of  fitting  a straight  line  to  observations  on  k = 2 random  variables,  it 
is  shown  in  Section  12,  however,  that  Wald’s  method  of  defining  estimates 


29/ 

— B.  L,  Welch,  "Further  Note  on  Mrs.  Aspin’s  Tables 
and  on  Certain  Approximations  to  the  Tabled  Function,  " Biometrika, 
vol.  36,  1949,  pp.  293-296. 

30  / 

— Alice  A.  Aspin,  "Tables  for  Use  in  Comparisons  Whose 
Accuracy  Involves  Two  Variances,  Separately  Estimated,  " Biometrika, 
vol.  36,  pp.  290-296,  1949. 
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a'  and  of  the  two  parameters  a and  p leads  to  exact  distributions  for 
(a^  - a)  and  for  (b*  - p),  but  Waldos  method  may  be  used  only  when  the 
population  variances  (r  , and  (r^.  are  independent  of  i. 

Tp.  SL 

For  simplicity  in  presentation  the  following  discussion  will 
be  limited  to  the  case  of  fitting  a straight  line  with  both  variables 
subject  to  error;  the  extension  to  the  general  case  discussed  in 
Section  4 is  straightforward.  We  will  consider  the  distribution  of 
S in  two  artificial  limiting  cases  (both  involving  implicitly  the  assump- 
tion that  m.  is  infinite),  and  we  will  in  this  way  obtain  an  approxima- 
tion to  the  distribution  of  S when  m.  is  large  but  finite. 


We  will  consider  first  the  somewhat  artificial  case  in  which 

2 2 

(T  (T  and  p,  are  assumed  to  be  known  for  each  value  of  i = 1 to  n, 

T^l  €1  '^i 

and  only  the  parameters  a and  b and  the  adjusted  values  X'.  and 

Cr  (T  1(T 

Y.'  are  estimated  from  the  n groups  of  data.  Such  a situation  might 

actually  arise  in  practice  if  the  experimenter  had  made  many— 

theoretically  an  infinite  number—  of  simultaneous  observations  of 

X..  and  Y.  in  order  to  establish  the  values  of  o-  c and  p.,  and 
it  it  rp.  GL  1 

then  wished  later  to  fit  a relatively  small  sample  of  data  from  the 

same  populations  to  a straight  line.  We  will  use  the  subscript  or  to 

distinguish  this  case  from  the  solution  described  in  Section  2 involving 

the  values  s s ,,  and  r.  obtained  from  the  sample  being  fitted.  In 
rp.  d 1 tr  o 

the  present  case  the  minimized  sum  S may  be  expressed: 


S 

O' 


a - b X.)^ 
cr  cr  1 


or.  =(cr  . - 2b  p.  O'  . O'  . + b cr  .)/m. 

1 T|1  O'  "^l  T]1  GL  o-  a 1 


(7.23) 


(7.24) 


We  may  also 


express 


S^  in  the  following  form: 


S 

O' 


(Y. -Y! 

1 10- 

2 

or  . 

Til 


2p.(Y.  - Y!  )(X.  -X!  ) 
11  10'  1 10' 


O'  . cr  . 

T|1  €1 


(7.25) 


The  equivalence  of  (7.  23)  and  (7.25)  may  be  established  by  substituting 
the  adjusted  values  XI  and  YI  appropriate  to  this  case  in  (7.25). 

17  I 

Hotelling — ' has  established  the  invariance  of  the  magnitude  of  quadratic 
forms  like  those  in  (7.25)  to  a rotation  of  the  coordinate  axes.  Thus, 
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consider  a rotation  of  the  X,  Y axes  by  an  angle  0 to  new  axes  U, 

V,  respectively,  where  positive  9 corresponds  to  counterclockwise 
rotation,  and 


tan  29  = 


2p.  cr  . cr  . 

1 T|1  €1 

2 2 

(cr  . - (7  .) 
a Tji 


(7.26) 


2 2 2 .2 

If  we  assume  that  p.  = p and  cr  . = C cr  then  tan  29  = 2pC/(C  -1); 

€L  pi  ^ 

in  this  case,  since  such  a rotation  is  independent  of  i,  the  errors 

u.  = €.  cos  0 + p.  sin  0 and  v^=  - €.  sin  0 + p.  cos  9 in  each  group  will 
11  1 11  1 ^ ® 

be  uncorrelated  in  the  new  coordinate  system  ' and  we  may  write: 


S 

c 


m.{V. 


O' 

V 


m.(U. 
1 1 


(7.27) 


The  above  is  now  in  exactly  the  same  form  as  the  minimized  sum  S 
studied  by  Demin^  J-J  Deming  fitted  a straight  line  on  the 

assumption  that  o'  and  o'  were  known  constants,  and  that  p =0, 

u V 2 

and  proved  for  this  case  that  S is  distributed  like  \ with  (n  - 2) 

degrees  of  freedom,-^  i.  e. , that  S^/(n  - 2)  is  distributed  like 
F(n  - 2,  oo).  In  view  of  the  above  invariant  transformation,  it  appears 
that  Deming' s results  will  also  apply  when  p is  different  from  zero 
provided  p.  = p and  o'  . = C o'  .. 

1 d pi 


t 2 2 2 2 

' Note  that  E(u.  v. ) = (o'  . - o'  . ) sin  0 cos  0 + p o'  . o'  .(cos  9 - sin  0 ) 

2 2 ^ ^ 2®-  2 ^ 61  pi'  2 

= 0'  . {(1  - C ) sin  0 cos  0 + pC(cos  9 - sin  0) } = 0 if  tan  20  = 2pC/(C  -1). 
Note  that  this  is  less  restrictive  than  assuming  that  the  population 
variances  are  the  same  for  each  group  since  o'^  may  vary  with  i. 


LI/  See  the  discussion  and  accompanying  references  in 
reference  7 on  pages  18,  23,  27,  141,  and  230.  For  the  case  of  fitting 
an  arbitrary  functional  relation  to  random  variables  U and  V with  un- 
correlated errors,  see  W.  E.  Deming,  '^On  the  Application  of  Least 
Squares--  III  A New  Property  of  Least  Squares^”  Phil.  Mag.  , Ser.  7, 
vol.  XIX,  p.  389,  Supplement,  February,  193  5. 
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We  will  consider  next  the  even  more  artificial  case  in  which 

the  population  parameters  X.  , Y.  , a,  and  p are  known  constants, 

2 2 lo  lo  I 

but  in  which  s s and  r.  are  estimated  from  the  samples.  It  will 
Tp.  a 1 

be  convenient  in  this  case  to  introduce  the  following  notation: 


F =Y-a-6X 
Pi  i ^ i 

(7.28) 

22^^  Z 2 

= s . - 2p  r.  s . s , + p s . 

pi  T)1  1 T|1  €1  €1 

(7.29) 

22^^  Z Z 

= (T  . - 2p  p.  cr  . cr  . + p cr  . 

Pi  Tjl  1 T|1  €1  a 

(7.30) 

S = 
o 


m.(Y  - a - PX  r m {(Y  - Y ) - P(X  - X 

1 1 1 1 1 lo  1 io 


2 

®pi 


2 


(7.31) 


(TO 


mi  {(Y.  - Y.^)  - P(X.  - X.^)}‘ 


Pi 


(7.32) 


The  expressions  on  the  right  of  (7.  31)  and  (7.  32)  may  be  obtained  by 
subtracting  ( Y.  - a - p X.  ) = 0 from  Y.  - a - p X.,  It  is  now  obvious 

lO  lO  1 1 

that  S is  the  sum  of  the  squares  of  n variables  each  of  which  is 

(TO 

normally  and  independently  distributed  about  zero  with  unit  variance; 

2 

thus  S is  distributed  as  y with  n degrees  of  freedom.  We  may  now 

cro 

apply  Box’s  theorem  to  the  above  expressions  in  ess entially  the  same 
way  it  was  applied  in  the  preceding  subsection,  and  thus  find  that 
S^/n  is  distributed  approximately  as  F(n,  v^q)  where: 


’'20  = 


(7. 33) 


2 , , 2 

When  all  of  the  m.  are  large,  b approaches  p,  s approaches  s.  and 

approaches  v_  as  defined  in  (7.  2),  It  should  be  noted  that  this  last 
result  is  valid  for  completely  arbitrary  value s of  cr^,  cr^ 


■ni’  Pi’ 


These  estimates  are  defined  in  (2.  2)  and  (2.  4). 
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and  this  suggests  that  the  requirement  that  p.  = p and 

2 2 2 ^ 

O'  = C O'  . imposed  in  deriving  the  distribution  of  S may  not  be 

d T|1  0- 

necessary  when  all  of  the  m.  are  large.  In  the  special  case  where 

2 ^ 2 

we  may  assume  that  cr  has  a constant  value  cr_  independent  of  i,  we 

2 Pi  P 

may  set  S = S /(Sq/ct  ) and  use  the  same  arguments  as  were  used 
o cr  o p P 

in  subsection  7.1  to  show  that  S /n  is  distributed  in  this  case  exactly 
as  F(n,  [m.  - 1]).  ° 

If  we  attempt  to  apply  the  above  distributions  for  S and  for 

S to  the  determination  of  the  distribution  of  S in  our  more  general 
o 2 2 

formulation  in  which  the  values  of  s s r.,  XJ,  Y.',  a,  and  b 

a rp.  1 1 1 

must  all  be  estimated  from  the  data,  we  find  that  the  n separate  terms 
in  S are  no  longer  independent  as  they  were  in  S , and  the  weights  are 

now  random  variables  instead  of  being  known  constants  as  they  were 

2 2 

in  S . Nevertheless,  since  the  estimated  values  s s .,  r.,  XI,  YI, 
or  2 2 ^ ^ 1 1 

a and  b all  approach  the  constant  values  cr  cr  .,  p.,  X.  , Y.  , a,  and 

a T]i  1 lo  lo 

P,  respectively,  as  m^  is  allowed  to  increase  without  limit,  it  appears 

that  S will  be  approximately  equal  to  both  S and  S when  all  of  the  m. 

o O'  — i 

are  large.  Thus,  since  all  of  the  above  discussion  may  be  extended 
without  formal  difficulties  to  the  general  case  discussed  in  Section  4, 
we  may  expect  S/v^  to  be  approximately  distributed  as  F(v^,  v^)  with 

1.  X ^ 

V defined  by  (7.1),  and  v defined  by  either  (7.  2)  or  (7.  3),  and  this 
approximation  should  be  better  the  larger  the  values  of  all  of  the  m.. 

By  using  v = n - u in  F(v  , v ),  approximate  allowance  has  been  made 

for  the  fact  that  the  sum  of  the  numerators  of  the  n terms  in  S, 
normalized  by  their  respective  variances,  has  only  (n  - u)  degrees  of 
freedom  since  the  estimated  values  of  u parameters  were  determined 
in  minimizing  S;  and  by  using  as  defined  in  (7.  2)  or  (7.  3),  ap- 
proximate allowance  has  been  made  for  the  degrees  of  freedom  in 
estimating  the  n variances  o'g..  Thus,  to  a first  approximation,  al- 
lowance has  been  made  for  tne  variances  of  all  of  the  random  variables 
entering  the  problem.  Even  for  small  values  of  m.,  changes  in  these 
random  variables  will  affect  our  approximate  distributions  in  the  same 
directions  as  they  affect  the  exact  distributions  ^Z^and  thus  our  approxi- 
mate theory  will  always  provide  a dependable,  even  if  not  exact,  guide 
to  the  analyst.  Even  when  the  exact  distributions  become  available 
in  a usable  form,  it  seems  likely  that  the  above -described  approxima- 
tions will  continue  to  be  useful  because  of  their  simplicity. 
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8.  THE  SIMPLEST  LEAST  SQUARES  PROBLEM: 
ONE  VARIABLE  AND  ONE  PARAMETER 

8.1  Without  "Systematic"  Errors 


This  is  the  simplest  of  all  least  squares  problems,  and  will 

be  discussed  in  some  detail  since  it  illustrates,  under  the  simplest 

possible  conditions,  many  of  the  characteristics  and  limitations  of 

our  least  squares  solution.  We  assume  in  this  subsection  that 

E(Y..)  = Y.  = a,  and  thus,  by  assumption,  exclude  any  "systematic" 
it  lo  2 2 

errors;  also  E{(Y.  - Y.  ) } = o-  .. 

it  lO  T]1 


F.  = Y.  - a 
1 1 


m. 

1 


(l/w.)  = sf  = s^. 


1 2 
— s . 
m.  m 
1 


t = l 


(8.1) 

(8.2) 


The  estimate  of  variance  (8.  2)  will  be  called  the  sample  "within 
group"  estimate. 


■*'10 

(8.3) 

All  = 

(8.4) 

= [w.(Y.  - aj] 

(8.  5) 

a = 


Fw.Y.I 
1 1 

[w.] 


[m  Y /s^  ]/[m  /s^  ] 

1 1 T|1  1 T^l 


(Variances  not  pooled)  (8.  6) 


We  see  by  (8.  6)  that  least  squares  leads  in  this  case  to  the  weighted 
mean  with  the  weights  equal  to  the  reciprocal  of  the  variances  of  the 
group  means  determined  from  the  sample  within  group  estimates  of 
variance.  Let  us  assume  th^t  it  is  reasonable  on  technical  grounds  to 
assume  that  the  variances  s . for  i = 1 to  n may  be  considered  to 

Til 
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be  samples  from  the  same  statistical  parent  population.  In  this  case 
we  may  pool  these  n estimates  and  obtain  a value  s which  does  not 
depend  on  i;  this  common  value  cancels  out  in  (8.  6)  which  now  becomes 


a = 
P 


[m.  Y.] 
1 1 


(Variances  pooled) 


(8.7) 


We  see  exhibited  here  the  two  basic  statistical  properties  of  our 
system  of  weighting:  (8.  6)  and  (8.  7)  both  show  that  the  relative  weight 
to  be  assigned  are  directly  proportional  to  the  number,  m^,  of  obser- 
vations averaged  in  obtaining  the  mean  values  Y.  while  (8.  6)  makes 
allowance  in  addition  for  variations  in  the  observational  conditions 
which  might  be  present  in  determining  these  n different  mean  values. 
For  example,  with  i = 1 and  2 suppose  verniers  were  available  on  the 
measuring  instruments  and  none  for  i = 3,  4 and  5;  this  could  lead  to 
values  of  s . and  s _ which  are  systematically  smaller  than  the 
2 2 2 

values  s , s and  s , even  with  m.  the  same  for  all  5 groups, 

T|3  t^4  T]5  1 

and  our  least  squares  method  has  been  formulated  so  as  to  give  an 

appropriate  additional  amount  of  weight  to  Y and  Y as  compared  to 

1 2 

Y^,  Y^  and  Y^  in  this  situation. 

3 4 5 

The  following  equations  apply  whether  or  not  the  variances 
are  pooled: 


S(a)  = [w.(Y^  - a)  ] 


(8.8) 


^1  = 


1 

^1 


[w.] 


(8.9) 


2 2 S 

s„  = s = 

F a n - 1 


^1  = 


[wj(Y.  - a)"] 
(n  - l)[w.] 


(8.10) 


Note  that  s is  an  estimate  of  the  standard  error  of  the  weighted  mean, 

a,  obtained  from  the  n independent  groups  of  observations  of  Y.  Our 
least  squares  solution  can  thus  be  expressed  in  the  following  form: 


a = a ± s 


(8.11) 
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In  this  particular  case,  on  the  assumption  that  the  population  mean 

Y.  is  equal  to  the  "true"  mean  a,  a is  an  unbiased  estimate  of  the 
lo  2 

"true"  mean  value  a,  and  s is  an  unbiased  estimate  of  the  variance 

a 

of  a.  We  will  see  as  we  proceed,  however,  that  the  assumption 
E(Y.^)  = a -will  not  always  be  realized  in  practice  because  of  the 
presence  of  systematic  errors. 


We  will  now  introduce  several  numerical  examples  in  order 
to  illustrate  more  clearly  the  nature  of  our  least  squares  solution  of 
the  one  variate  problem.  In  order  to  ensure  that  the  data  are  from 
normal  populations,  we  will  construct  our  observations  by  using  the 
table  of  random  normal  deviates  in  the  Appendix  to  reference  7.  This 
also  has  the  advantage  that  we  will  know  in  these  illustrative  examples 
the  population  mean  Y^^  = a and  population  variances  thus,  for 

all  of  the  examples  in  this  section  we  take  Y.  = a = 17,  and  for  the 
example  in  Table  8.1  we  take  cr^.  = 1 for  i = 1 to  5.  Thus  the  5 groups 
of  observations  in  Table  8,1  might  correspond  to  observations  made 
by  5 different  observers  in  5 different  laboratories. 

2 

It  may  be  noted  in  passing  that  s . should  normally  be  cal- 
culated by  means  of  the  following  exactly  equivalent  formula  rather 
than  directly  from  its  definition  (2.  2): 


2 


s . 


(8.12) 


On  modern  electrical  calculators  the  two  sums  in  (8.12)  may  be 
obtained  in  a single  operation;  when  (8.12)  is  used  it  will  be  neces- 
sary to  carry  more  significant  figures  than  would  be  the  case  if 
(2.  2)  were  used,  but  this  is  readily  done  on  a modern  electrical 
calculator. 


From  the  data  in  Table  8.1  and  equations  (8.  6),  (8.  8)  and 

(8.10)  we  obtain:  a = 17.223,  S = 2.939,  and  s = 0.124;  thus: 

a 


a = 17.223  ± 0.124 


(Variances  not  pooled)  (8.13) 
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Table  8.  1 


Y.  = a 

lO 

= 17;  0-  . = 

Tjl 

or  = 1 ; m.  = m 

T1  1 

= 10;  n = 5 

t 

i = 1 

i = 2 

i = 3 

i = 4 

i = 5 

1 

18.95 

18.  87 

17.  63 

14.  08 

18.72 

2 

18.  57 

18.41 

18.  17 

18.  53 

16.92 

3 

15.  81 

16.  63 

18.25 

16.49 

18.29 

4 

15.  53 

16.  75 

16.  76 

15.98 

16.  04 

5 

17.  35 

16.  75 

16.69 

16.22 

17.91 

6 

15.93 

17.  57 

16.66 

16.  02 

15.48 

7 

17.  54 

18.  27 

17.  53 

17.  17 

17.  70 

8 

16.  54 

16.41 

15.62 

18.  31 

18.21 

9 

17.  81 

16.98 

17.  19 

17.  30 

17.40 

10 

18.  16 

16.62 

18.27 

15.  58 

17.  35 

Y. 

1 

17.219 

17. 326 

17.277 

16. 568 

17.402 

2 

s . 

T]1 

1.45968 

0.  79054 

0. 74051 

1. 74686 

1. 03440 

2 

®Yi 

0. 145968 

0.  079054 

0. 074051 

0. 174686 

0. 103440 

w. 

1 

6. 85082 

12. 64958 

13. 50421 

5.  72456 

9. 66744 

Y.  - 
1 

Y! 

1 

-0. 004 

0.  103 

0.  054 

-0. 655 

0.  179 

1 

0. 00011 

0. 13420 

0. 03938 

2.45598 

0.  30975 

Pi 

0.99 

0.  73 

0.  85 

0.  15 

0.59 

Y.  - 

1 

Y. 

lO 

0.219 

0.  326 

0.277 

-0.432 

0.402 

lO 

0. 32857 

1. 34435 

1. 03616 

1. 06834 

1. 56230 

p! 

lO 

0.  58 

0.28 

0.  34 

0.  33 

0.24 

Y.  - 
1 

y: 

ip 

0.  061 

0.  168 

0.  119 

-0. 590 

0.244 

ip 

0. 03223 

0. 24449 

0.  12267 

3. 01542 

0. 51573 

P.I 

0.  86 

0.  63 

0.  74 

0.  12 

0.49  • 
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Before  accepting  the  above  result,  it  is  desirable  to  mcike  tests  to 
determine  whether  some  of  the  group  means  contain  systematic  errors. 
All  of  the  tests  given  in  this  section  of  the  paper  depend  upon  the 
assumption  that  the  observations  are  from  normally  distributed 
populations;  in  the  illustrative  examples  this  is  insured  since  they 
were  constructed  from  a table  of  random  normal  deviates.  For  this 
problem  S = [G?]  = 2.  93942  and  it  was  shown  in  subsection  7.  2 that 
an  approxim_ate  test  of  the  hypothesis  that  all  5 group  means  are  from 
populations  with  the  same  a but  with  possibly  different  variances 

cr^.  may  be  obtained  by  setting  S/(n  - 1)  = 0.  73485  = F(4,  v , p)  where 

2 2 4 ^ 

V = (m  - l)[s  .]  /[s  . ] = 40.  3803.  From  the  graphs  in  reference  19 

2 T)1  TJl 

we  find  p ™ 0.  58.  Note  that  p is  approximately  the  probability  of 
observing,  in  repeated  random  sampling  with  samples  of  this  size 
from  populations  with  arbitrary  variances,  a value  of  S larger  than 
the  value  actually  observed.  Since  p =0.  58  is  much  larger  than  the 
level  0.  05  arbitrarily  adopted  t throughout  this  paper  as  the  minimum 
permissible  value  of  p for  accepting  the  hypothesis,  we  conclude  that 
there  is  no  statistical  evidence  that  these  5 groups  of  data  are  not 
from  populations  with  the  same  mean  a. 

We  may  also  examine  the  estimated  errors  Y.  - YI  as  given 
in  Table  8.1;  for  the  one  variate  problem  YJ  = a.  The  probabilities 
p^  provide  a more  detailed,  although  less  accurate,  check  for  the 

presence  of  systematic  errors.  Thus,  the  probabilities  pj  may  be 

. . 2 ^ 

obtained  by  setting  G.  = F(l,  9,  p.')  and,  in  the  present  illustrative 

problem,  since  a = 17  is  known,  we  may  also  determine  probabilities 
p!  by  setting  G?  = F(l,  9,  pi  ).  Here  G?  is  the  value  of  T?  obtained 

lO  lO  lO  lO  1 

with  Y'^  replaced  by  a.  The  probability  pi  represents  the  probability 

io  fi. 

of  observing,  in  repeated  sampling  from  the  i population  with  samples 
of  this  size,  a value  of  G.  larger  than  the  value  actually  observed. 

2 2 

Since  G.  approaches  G.  as  all  of  the  m.  are  allowed  to  increase  without 

1 lO  1 

limit,  we  see  that  pi  will  also  to  this  degree  approximate  pl^  and  thus 


^ Note  that  the  level  chosen  in  practice  for  rejecting  the 
hypothesis  should  be  adopted  in  advance  of  making  the  test;  the  proba 
bility  level  actually  used  should  depend  on  the  risk  involved,  and 
should  be  chosen  with  due  regard  for  the  alternative  actions  to  be 
taken  if  the  hypothesis  is  either  accepted  or  rejected. 
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the  probabilities  p|  provide  rough  indications  of  the  separate  relia- 
bilities of  the  n means  Y..  We  see  in  Table  8.1  that  none  of  the 

1 

probabilities  p.'  are  less  than  0.  05,  so  we  have  no  statistical  reason 
1 

to  suspect  any  of  these  5 means.  Note,  however,  that  the  p'  differ 

i 

substantially  from  the  corresponding  probabilities  p'  , and  thus  they 

io 

should  not  be  relied  on  except  for  a rough  check. 

In  view  of  the  above  checks  we  conclude  that  (8.13)  represents 
an  acceptable  solution  to  our  problem. 

2 

It  was  shown  in  subsection  7.  2 that  (n  - l)[w. ](a  - a)  /S  is 
distributed  approximately  as  F(l,  n - 1).  This  result  permits  us 
to  determine  a confidence  band  for  a: 

Y* {100(1  - p)%}  sa  ± (SF(1,  n - 1,  p)/(n  - l)[w.]}^''^  = a ± s {F(l, 

1 3i 

(8.14) 

The  result  (8,14)  is  to  be  interpreted  as  follows.  If  n = 5 samples 
of  m^  = 10  individuals  are  observed  repeatedly  from  these  same  5 

populations,  which  are  assumed  to  have  the  same  population  mean 
a but  possibly  different  variances,  and  if  confidence  bands  as  defined 
by  (8. 14)  are  constructed  for  each  such  sampling,  then  as  the  number 
of  such  samplings  is  increased  without  limit,  it  will  be  found  that 
(approximately)  a fraction  (1  - p)  of  the  confidence  bands  so  constructed 
will  contain  the  population  mean  a.  From  reference  19  we  find 
F(l,  4,  0.5)  = 0.54863,  F(l,  4,  0.05)  = 7.  7086,  and  F(l,  4,  0.  005)  = 31.  333 
so  that  Y*(50%)  ^17.131  to  17.  315j  Y*(95%)  S16.879  to  17.  567  and 
Y*(99.  5%)  = 16.  529  to  17.  917.  Note  that  for  this  particular  sample 
the  50%  confidence  band  does  not  contain  the  population  mean  a = 17, 
but  that  the  95%  and  99.  5%  confidence  bands  do  contain  the  population 
mean.  All  we  can  say  is  that  the  population  mean  will  be  found  in 
approximately  100(1  - p)%  of  the  confidence  bands  constructed  in  this 
manner. 

Suppose  now  that  some  physical  theory  indicates  that  the  "true” 
value  of  a = 16.  Since  we  can  write: 

(n  - l)[w.](a  - a)^/S  S F(l,  n - 1,  p) 


(8.15) 
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2 

and  we  know  that  (n  - l)[w^](a  - 16)  /S  = 98  = F(l,  n - 1,  p),  we  find 

in  reference  19  that  p —0.0006.  Thus  we  may  conclude,  on  the  assump- 
tion that  the  theory  is  correct,  either  (a)  that  the  observed  sample 
contains  "systematic”  errors  or  (b)  that  the  observed  sample  just 
happened  to  have  a large  deviation  from  the  population  mean,  in  fact 
in  this  case,  so  large  that  a still  larger  deviation  would  be  expected 
to  be  observed  in  random  sampling  with  a probability  of  only  0.0006, 
Alternatively,  if  the  analyst  had  adopted  0,05  in  advance  as  his 
hypothesis  rejection  level,  these  data  would  provide  a basis  for  re- 
jecting the  theory. 


Since  the  example  in  Table  8, 1 was  constructed  by  random 
sampling  from  5 groups  with  the  same  normal  distribution,  i.  e. , 

Y.  =17  and  cr  . = 1,  we  know  in  advance  that  the  variances  may  be 

lo  Til  ! 

pooled.  It  will  nevertheless  be  mstructive  to  apply  Bartlett's  test to 

these  five  observed  variances  s . in  Table  8.1.  Bartlett  has  shown 

2 

for  n sample  variances  s . from  populations  with  the  same  population 
2 

variance  o"  that  B/(n  - 1)  is  distributed  approximately  as  Fisher's 
variance  ratio  F(n  - 1,  oo),  where: 


B = {[m.  - 1]  log  s^  - [(m.  - 1)  log  s^.]}  (8.16) 

C 1 10  T]  1 10  Tfl 


c 


1 + 


1 


3(n  - 1) 


2 _ 

"1  [m.  - 1] 

1 


(8.17) 


(8.18) 


The  above  test  is  useful  even  for  small  values  of  m.,  say  5 or  more. 

1 . 2 

If  we  apply  this  test  to  the  variances  in  Table  8.1  we  find  s =1,15440, 
C = 1.  04444,  B/(n  - 1)  = 0.  61073  = F(4,  oo,  p);  and  from  reference  19 


M,  S,  Bartlett,  "Properties  of  Sufficiency  and  Statistical 
Tests,  " Proc.  of  the  Royal  Society  of  London,  vol.  160A  (1937), 

pp.  268-282. 
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we  find  p = 0.  66.  Since  this  is  much  larger  than  the  level  0.  05 
arbitrarily  adopted  throughout  this  paper  as  the  minimum  permis- 
sible value  of  p for  accepting  the  hypothesis,  we  conclude  that  there 
is  no  statistical  evidence  th?it  these  5 sample  variances  are  not  from 
populations  with  the  same  variance.  Note  that  B/(n  - 1)  is  only 
approximately  distributed  as  F(n  - 1,  oo);  however,  Thompson  and 

Merrington— ^ have  developed  tables  from  which  the  exact  5%  and  1% 
significance  levels  for  B may  be  determined,  and  these  will  be  useful 
when  p is  near  the  rejection  level  chosen.  Reference  may  also  be  made 
to  a paper  by  Hartley in  which  a test  for  heterogeneity  of  variance 
is  given  which  involves  only  the  ratio  of  the  largest  to  the  smallest 
variances  in  the  n groups;  this  test  is  more  convenient  but  less  power- 
ful than  Bartlett’s  test,  and  is  recommended  only  when  m.  has  the  same 
value  m for  all  n groups. 

In  some  problems  it  may  happen  that  a test  is  required  of  the 
statistical  significance  simply  of  the  departure  of  the  largest  of  a set 
of  variances.  For  example,  it  might  be  reasonable  to  pool  the  re- 
maining variances  if  the  largest  of  the  set  were  eliminated,  Cochran 
has  developed  such  a test,and  appropriate  tables  for  the  application  of 
this  test  are  given  in  reference  11. 

2 

Using  the  pooled  variance  s = 1.15440,  we  obtain: 


T| 

a = [Y.]/n  = 17.158  (8.19) 

p 1 

S = m[(Y.  - a)  ]/s^  = 3.93054  (8.20) 

s - [(Y.  - a)^]/n(n  - 1)  = 0.022687  (8.21) 

cL  X 

a = 17.158  ± 0.151  (Variances  pooled)  (8.22) 


Catherine  M.  Thompson  and  Maxine  Merrington,  ** Tables 
for  testing  the  homogeneity  of  a set  of  estimated  variances,  ” Biometrika, 
Vol.  33  (1946),  pp.  296-304.  These  tables  are  also  available  in 
reference  9,  page  198. 

H.  O.  Hartley,  "The  maximum  F-ratio  as  a short  cut 
test  for  heterogeneity  of  variance,  " Biometrika,  Vol.  37  (1950)  pp.  308-312. 
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In  this  case  the  exact  results  in  section  7,1  apply  and  we  may  set 

S/(n  - 1)  = 0.  98263  = F(4,  45,  p)  and  find  p = 0.  425;  ;Y*(50%)  = 17.  046 

to  17.270;  Y*(95%)  = 16.  740  to  17.  576  and  Y^-(99.  5%)  = 16.  315  to  18.  001; 

furthermore,  the  probability  p=  0.  0016  is  obtained  in  this  case  using 

pooled  variances  for  an  assumed  ’’true"  value  a = 16.  The  probabilities 

p'  corresponding  to  this  case  are  also  given  in  Table  8.1. 
iP 

Thus  we  see  that  only  slightly  different  results:  (8,22)  or 
(8.13)  are  obtained  in  this  example,  depending  upon  whether  or  not  the 
experimenter  assumes  that  the  5 population  variances  are  the  same 
or  assumes  that  they  may  be  different.  In  general  it  is  better  to  assume 
that  the  population  variances  may  be  different  unless  there  are  good 
physical  reasons,  independent  of  the  data  and  independent  of  statistical 
tests  on  it,  for  expecting  that  the  population  variances  are  the  same. 

For  example,  it  might  be  expected  that  the  variances  would  be  the 
same  for  n groups  of  measurements  made  under  similar  environmental 
conditions  with  apparatus  of  the  same  manufacture.  However,  even  in 
this  case,  the  n population  variances  might  be  different  if  n different 
observers  were  involved  and  it  was  established  (possibly  by  statistical 
tests  for  heterogeneity  of  variance  or  otherwise)  that  a significant 
portion  of  the  variance  was  contributed  directly  by  the  observers. 


8.2  Example  with  Different  Population  Variances 
and  without  "Systematic"  Errors 


We  will  illustrate  the  results  of  the  preceding  subsection  fur- 
ther by  considering  a numerical  example  constructed  in  such  a way 
that  the  population  variances  are  different.  The  numbers  in  Table  8,  2 
were  obtained  from  the  table  of  random  normal  deviates  in  the  Appen- 
dix to  reference  7 using  the  assumed  population  parameters:  Y.  =17, 
22  222/,  lo 

O'  - = cr  _ = 1,  and  cr  =(r  . = (t  - = 6.  25.  Let  us  assume  for  our 
T|1  r\Z  ti3  t]4  r]5 

discussion  that  the  observations  for  the  groups  i = 1 and  i = 2 might 
have  been  made  using  instruments  with  verniers,  while  the  observa- 
tions for  the  groups  i = 3,  4 and  5 might  have  been  made  with  instruments 
not  having  verniers.  Thus  the  experimenter  would  have  a good  a priori 
physical  reason,  independent  of  an  analysis  of  his  data,  to  doubt  whether 
the  variances  would  be  the  same  for  all  5 groups,  although  he  might 
expect  the  same  variances  in  groups  1 and  2 and  in  groups  3,  4,  and 
5,  respectively. 
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Table  8.  2 


Y.  = a = 17; 

lO 

"^Til  '^r]2 

°“il3  ■ "^ti4  ■ 

cr  =6.25 
r|5 

t 

i = 1 

i = 2 

i = 3 

i = 4 

i = 5 

1 

16.  46 

17.42 

16.  35 

22.  10 

19.08 

2 

16.79 

18.67 

14.45 

14.30 

18.45 

3 

16.40 

17.67 

15.98 

15.  52 

16.  08 

4 

15.41 

17.  06 

13.95 

17.  70 

17.65 

5 

16.40 

18.  37 

14.  30 

20.  25 

18.  33 

6 

17.22 

16.43 

17.00 

14.  57 

15.63 

7 

18.45 

16.  31 

15.97 

17.27 

19.85 

8 

16.  16 

17.79 

14.  60 

18.  70 

12.  70 

9 

18.  72 

16.43 

20.95 

16.  15 

15.40 

10 

17.93 

16.  00 

14.  83 

17.  02 

15.98 

Y. 

1 

16.994 

17.215 

15. 838 

17.  358 

16.915 

2 

s . 

1. 14036 

0.  84249 

4. 24308 

6. 09466 

4. 60936 

2 

s 

Yi 

0. 114036 

0. 084249 

0.424308 

0. 609466 

0.460936 

w. 

1 

8. 76916 

11.86958 

2.35677 

1. 64078 

2. 16949 

Y.  - 
1 

y: 

1 

-0. 012 

0.209 

-1. 168 

0.  352 

-0.  091 

1 

0. 00129 

0. 51793 

3. 21576 

0.20317 

0. 01801 

p! 

1 

0.97 

0.49 

0.  11 

0.  66 

0.90 

Y.  - 
1 

Y. 

lO 

-0. 006 

0.215 

-1. 162 

0.  358 

-0. 085 

lO 

0. 00032 

0. 54867 

3.  18223 

0. 21029 

0.  01567 

pL 

0.99 

0.48 

0.  11 

0.  66 

0.90 
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If  we  apply  Bartlett's  test  for  homogeneity  to  the  5 variances 
in  Table  8.  2,  we  find  s^  = 3.  386,  B/(n  - 1)  = 2.  92  = F(4,  oo,  p);  and 
from  reference  19  we  find  p = 0.  02.  Since  this  is  less  than  the 
probability  level  0.  05  arbitrarily  chosen  throughout  this  paper  for 
rejecting  the  hypothesis,  we  conclude  (as  was  to  be  expected  in  view 
of  the  way  the  data  in  Table  8.  2 were  obtained)  that  it  is  statistically 
unlikely  that  these  5 variances  are  samples  from  populations  with  the 
same  variance, 

2 

Inspection  of  the  observed  variances  s . in  Table  8,  2 

. 

indicates  that  the  use  of  the  vernier  evidently  reduced  the  variances 
for  these  two  groups  substantially.  To  test  the  statistical  significance 
of  this  difference,  we  may  pool  the  variances  obtained  with  and  without 
the  vernier:  s^  (with  vernier)  = 0.  99143  and  s^  (without  vernier)  = 4.  98237. 

T|  Ti 

On  the  assumption  that  they  are  samples  from  normal  populations  with 

2 2 2 
the  same  variance  o'  , the  ratio:  s (without  vernier) /s  (with  vernier) 

T)  “n  5 'n 

would  be  distributed  as  F(v  , v ) with  v = / (m.  - 1)  = 27(without 

12  1 2-/  i 

2 i = 3 

vernier)  and  ^ (m^  - 1)  = 18  (with  vernier).  If  we  set 

i = 1 

4.  98237/0.  99143  = 5.  025  = F(27,  18,  p)  we  find  by  reference  19  that 
the  probability  of  observing  a ratio  as  large  or  larger  than  this  by 
chance,  if  the  samples  were  actually  from  normal  populations  with 
the  same  variance,  is  p = 0.  0004,  Thus  clear  physical,  and  strongly 
supporting  statistical,  evidence  is  available  as  to  the  practical  im- 
portance of  the  use  of  the  vernier.  This  example  also  illustrates  the 
importance  of  using  all  of  the  physical  and  statistical  information 
available  in  drawing  conclusions  from  the  analysis;  in  this  example 
our  confidence  in  rejecting  the  hypothesis  that  the  variances  were 
equal  increased  from  0.98  (based  on  Bartlett's  test)  to  0,9996  when 
the  prior  knowledge  was  added  as  to  the  particular  groups  for  which 
the  verniers  were  used. 

Consider  now  the  following  three  estimates  of  a:  (a)  with 
no  pooling  of  variances  a = 17.  006  ± 0,192;  (b)  with  partially  pooled 
variances  a = 17.  012  ± 0.181;  and  (c)  using  only  the  data  from  groups 
1 and  2 for  which  the  vernier  was  used  a = 17.105  ± 0.110.  Since  the 
estimate  (b)  above  makes  use  of  all  of  the  physical  and  statistical 
evidence  available  to  the  analyst,  it  is  to  be  preferred  over  the 
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estimate  (c)  even  though  the  latter  has  a smaller  estimated  standard 
error.  On  the  basis  of  the  above  description  of  the  experiment  there 
is  no  way  of  choosing  between  the  estimates  (a)  ajid  (b);  however,  the 
estimate  (a)  is  more  conservative  and  should  certainly  be  adopted  if 
the  experimenter  has  reasons  (different  observers,  for  example)  to 
doubt  that  groups  1 and  2 and  groups  3,  4,  and  5,  respectively,  have 
the  same  population  variances. 


8,  3 Example  with  "Systematic**  Bias  and 
Random  ** Systematic'*  Errors 

We  assume  initially  in  this  subsection  that  each  group  of 
observations  (i  = 1 to  n)  is  subject  to  a systematic  bias  v^  and  a 

random  "systematic**  error  v.,  i.  e. , Y..  = Y.  +T]  =a+v  +v.  + 11,.; 

^ 1 it  lo  it  o 1 'it 

as  m.  approaches  infinity  E(Y.  ) = Y.  + v.  = a + v + v,  and,  as  n 
1 it  lo  1 o i 

approaches  infinity  E(Y.  ) = Y,  = a + v . Thus  v is  an  assumed 

lo  io  o o 

systematic  bias  which  has  the  same  value  for  all  n groups  and  for  each 
observation  within  each  group.  A constant  systematic  bias  of  this  kind 
cannot  be  detected  by  least  squares.  In  fact  it  can  be  shown  more 
generally  that  such  systematic  biases  occurring  in  one  or  more  of  the 
observed  variables  in  the  multivariate  model  of  Section  4 cannot  be 
detected  by  least  squares.  The  proof  of  this  general  statement  follows 
from  the  fact  that  our  least  squares  solution  is  invariant  to  a transla- 
tion of  the  coordinate  axes  and  a systematic  bias  in  an  observed 
variable  is  equivalent  to  a translation  of  the  corresponding  coordinate 
axis  by  the  amount  of  this  systematic  bias.  Since  such  systematic  bias 
cannot  be  detected  by  least  squares,  the  analyst  should  remember  that 
his  solution  of  the  above  one  variate  problem  can  yield  only  an  estimate 
of  the  population  mean  Y^^  = a + v^  or,  more  generally,  that  the  popula- 
tion means  multivariate  problem  which  he  can  estimate  by 

a least  squares  analysis  may  actually  be  the  sum  of  a "true"  value  plus 

an  unknown  systematic  bias  v.  present  in  all  of  the  observations  of  the 
th  JO 

j variable.  Thus  we  conclude  that  other  means  than  least  squares 
must  be  used  to  detect  such  systematic  biases.  Throughout  the  remainder 
of  this  paper  it  will  be  convenient  to  eliminate  explicit  allowance  for  such 
systematic  biases,  and  the  student  should  remember  that  his  least 
squares  analysis  leads  only  to  population  mean  values  and  not  neces- 
sarily to  the  "true"  values. 
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Our  one  variate  model  with  random  systematic  errors  may 
now  be  described  as  follows:  = Y.  + v.  = Y.  + v.  + t| 

it  lO  It  lO  1 It 

approaches  infinity  E(Y.^)  = Y.^  + v.,  = cr^,  = 0 

for  i ^ j and  E(n.  ti.  ) = 0 for  t ^ u;  and  as  n approaches  infinity 
it  lU 

2 2 

E(v.  ) = O'  , E(v.  V.)  = 0 for  i ifc  j.  The  values  in  Table  8.  3 were  ob- 

1 V 1 j 

tained  from  a table  of  random  normal  deviates  using  the  population 

2 2 2 

param.eters  a = 17,  o'  . = o'  =1,  m.  = m = 10  and  o'  =9.  We  will 

^ T^l  T|  1 V 

assume  that  the  variances  may  be  pooled  and  the  solution  in  this  case 
is: 


a = 17.238  ± 1.026.  (8.23) 

2 

For  this  example  S = [G.  ] = 467.  62,  and  we  may  test  the  hypothesis 
2 ^ 

O'  =0  by  setting  S/(n  - 1)  = 116.  91  = F(4,  45,  p).  In  this  case  we  find 

V 

that  p<0.  0001;this  is  the  probability  of  observing  a value  of  S as 
large  or  larger  than  the  value  actually  observed  in  random  sampling 

from  populations  with  the  same  population  means  and  variances  and 
2 

with  r = 0.  Since  p is  less  than  the  level  0.  05  chosen  for  rejecting 

^ 2 
the  hypothesis,  we  conclude  that  o'  ^0. 

V 

It  is  of  interest  now  to  examine  the  estimated  errors  Y.  - YI 

1 1 

as  given  in  Table  8.  3;  note  that  one  of  these  has  a large  negative 
value:  -3.9262.  If  the  experimenter  has  some  actual  physical  reason 
for  believing  that  the  third  group  of  measurements  might  be  biased  in 
this  particular  direction,  he  might  be  led  to  reject  this  group  entirely. 
However,  the  statistical  analysis  can  give  him  still  further  assistance 
in  arriving  at  a correct  solution.  With  the  third  group  eliminated, 
n = 4 and  we  obtain  a = 18.130  ± 0.454;  S = 58.176,  and  if  we  set 
S/(n  - 1)  = 19.392  = F(3,  36,  p)  we  find  p <0.  0001;  thus  we  still  have 
statistical  evidence  of  random  systematic  errors  in  the  remaining  4 
group  means  and  conclude  that  the  elimination  of  the  third  group  of 
measurements  did  not  improve  matters  appreciably. 

2 

Thus  we  see  that  the  within  group  variances  s although 
consistent  among  themselves,  do  not  measure  all  of  the  variance  of 
the  data.  Other  random  errors  v.  evidently  also  occur  from  one 
group  to  the  next,  and  the  experimenter  will  wish  to  understand  these; 
for  example,  if  the  n groups  of  measurements  were  made  on  n different 
days,  an  explanation  would  naturally  be  sought  in  terms  of  possibly 
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Table  8.  3 


Y.  = 17;  0-^,  = 1 (i 

lO  T|1 


t 

i = 1 

i = 2 

1 

17.21 

17.43 

2 

16.  56 

16.66 

3 

18.  19 

18.  01 

4 

17.  50 

18.43 

5 

18.87 

17.64 

6 

17.96 

16.68 

7 

17.70 

16.65 

8 

18.  84 

17.83 

9 

16.83 

17.66 

10 

17.43 

17.84 

Y. 

1 

17.709 

17.483 

Y.  + V. 

lO  1 

17.69 

17.27 

2 
s . 

til 

0. 59832 

0.  38858 

2 

s 

0.45044 

0.45044 

w. 

1 

22.20031 

22.20031 

Y.  - YI 
1 1 

0.4708 

0.  2448 

1 

4.92076 

1. 33040 

Pi 

0.  053 

0.28 

Y.  - a 
1 

0.  709 

0.483 

lO 

11. 1597 

5.  1791 

pi 

lO 

0. 0089 

0.  050 

or  w. 
1 

23. 5606 

23. 5606 

Y.  - y: 
1 1 

-0.421 

-0. 647 

G^ 

1 

4.  1759 

9.  8627 

Pi 

0.  073 

0.  013 

= 1 to  5);  (T^  = 9 

V 


i = 3 

i = 4 

i = 5 

13.48 

19.29 

19.29 

12.04 

20.  16 

16.67 

13.  79 

19.70 

18.01 

12.48 

18.90 

17.81 

13.45 

19.61 

17.20 

13.  19 

18.76 

17.40 

12.  52 

20.05 

18.01 

13.  81 

19.45 

18.47 

14.24 

19.45 

17.62 

14.  12 

19.34 

18.08 

13.312 

19.471 

17.856 

13.  31 

19.67 

17.84 

0. 55447 

0.  19601 

0. 51485 

0.45044 

0.45044 

0.45044 

22.20031 

22.20031 

22.20031 

-3.9262 

2.2328 

0.6178 

342.21881 

110.67733 

8.47334 

<0. 0001 

<0.0001 

0.  017 

-3. 688 

2.471 

0.856 

301.9540 

135. 5516 

16.2670 

<0. 0001 

<0.0001 

0.0027 

0 

23. 5606 

23. 5606 

1.  341 

-0.274 

42.3686 

1.7688 

<0.0001 

0.22 
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different  experimental  conditions  on  these  n days.  In  particular, 

the  analyst  will  wish  to  obtain  an  estimate  of  variances  of 

these  random^systematic  errors.  The  following  derivation  of  an 

estimate  of  ^ will  be  applicable  to  the  general  case  in  which  both 
2 ^ 

m.  and  may  differ  from  group  to  group.  Consider  the  minimized 
sum  for  our  present  model  which  now  involves  random  ’’systematic" 


errors  v. : 
1 


(8.24) 


wl  = m, /(T  , 
1 i T^i 


(8.25) 


Y.  - Y = V.  + Tf.  - 


r * ^ih 

L~^i; — J 


2L 


1 1 


r rn. 
1 


2 

O'  . -* 

Til 


(8.26) 


m. 


1 V 

^i  ~ 

1 t^l 


(8.27) 


Using  these  relations  it  may  be  shown  that  the  expected  value  of 
may  be  expressed: 


,[w^f  - [w’^] 

E(S^)  = (n-l)4<r^|— 


(8.28) 


2 2 

If  we  replace  o'  . by  s . on  both  sides  of  (8.  28)  we  obtain  the  following 

^ ^ ' Til 

estimate  for 

V 


*2  f _S_  .I  f 
% - |(n-l)  - I 


(8.29) 
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Note  that  this  reduces,  when  the  variances  or  , are  the  same  so  that 
2 

the  s . may  be  pooled,  to  the  following  unbiased  estimate: 
r\L 

2 
s 


^2 

cr 

V 


where  m = 


m ^(n  - 1) 

[m.  - [m.  ] 

1 1 

(n  - l)[m.] 


- 1 


2 2 

(cr  . = O'  ; variances  pooled) 

711  T) 


(8. 30) 


Using  the  result  obtained  in  Section  7 that  S/(n  - 1)  is  distributed 

approximately  as  F(v  , v ) with  v and  v defined  by  (7.1)  and  (7.2), 

12  1 2 

respectively,  we  may  use  the  method  of  Bross — 'to  obtain  the  follow- 
ing approximate  fiducial  distribution  for  o-^: 


2 ^2 

cr  (1  - p)  = cr 
V V 


n - 1 


- F(v^. 


2* 


P)h/ 


SF(v^,  oo,  p) 
n - 1 


- F(v^,  v^,  p) 


(8.  31) 

2 

In  the  above  cr  (1  - p)  denotes  the  value  which  the  population  value 
2 ^ 

(T^  will  exceed  in  repeated  random  sampling  from  the  same  normal 

population  with  an  approximate  fiducial  probability^  of  (1  - p).  For 

2 

values  of  p so  small  that  S < (n  - 1)  F(v  , v , p),  cr  (1  - p)  = 0. 


I.  Bross,  ’’Fiducial  intervals  for  variance  components,  ” 
Biometrics,  Vol.  6,  page  136,  1950. 

t 2 

This  is  the  same  as  the  unbiased  estimate  of  c determined 

V 

by  analysis  of  variance;  see  page  328  in  reference  9 or  page  312  in 
reference  11. 


The  subtle  distinction  between  confidence  and  fiducial 
intervals  is  well  described  by  M.  G.  Kendall  in  the  book  ’’The  Advanced 
Theory  of  Statistics,  ” Vol.  II,  Chapter  20,  Charles  Griffin  and 
Company,  London,  1946.  For  most  applications  the  distinction  between 
confidence  and  fiducial  intervals  is  of  little  importance. 
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To  illustrate  the  use  of  the  above  formula,  we  will  detexrnine 
for  the  example  in  Table  8,  3 the  90%  fiducial  interval  for  cr  : 

O'  10,95)  < O'  < 0-^  (0.  05)  with  probability  0.  9: 

V V V 

o-^(0.9  5)=  5.  22086  {116.  91  - 2.  5790}  / (116.  91  x 2.3719  -2.  5790  > = 2.173 


ij-^(0.  05)  = 5.22086(116.  91  - 0.17519}/ (116.  91  x 17768  - 0.17519}  = 29.  59 


For  this  particular  problem  the  population  value  cr  =9  happens  to  lie 

within  this  90%  fiducial  interval;  however,  the  population  value  would 
be  expected  to  lie  within  only  approximately  90%  of  a large  number 
of  such  intervals  so  constructed  from  random  samples. 


The  derivation  of  the  above -described  approximate  fiducial 
distribution  for  depended  on  the  assumption  that  both  the  and 
the  V.  are  samples  from  normal  populations  with  mean  zero.  On  this 
assumption  we  may  also  obtain  confidence  intervals  for  a from  the 
data  of  Table  8.  3 ip  exactly  the  same  way  as  was  done  in  subsections 
8.1  and  8.2;  but  now,  bec^ause  of  the  additional  variance  between 
groups,  these  intervals  are  naturally  much  larger:  Y*(50%)  = 16.  478 
to  17.998;  Y*(95%)=  14.389  to  20.087  and  Y*f'(99.  5%)  = 1 1.  495 to 22.  981. 


Since  we  have  only  a confidence  p < 0,  0001  that  the  samples 
in  Table  8.  3 are  free  from  random  "systematic"  errors,  we  may 
conclude  that  such  errors  are,  in  fact,  present  and  obtain  a solution 
to  our  problem  on  this  assumption.  The  only  modification  to  the 
analysis  which  is  required  is  the  replacement  of  the  weights 

w.  = m/s  by  w."  = l/(  o'  + s,^)  with  s,,  = s /m.  In  the  present  case, 

1 Ti  l vY  Yq 

since  the  5 weights  were  the  same  in  determining  the  solution  (8.23), 

the  revised  solution  will  still  be  (8.  23)  since  the  w"  still  do  not  vary 

from  group  to  group.  However,  S"  will  now  be  smaller  by  the  factor 

s^/(s^  + m^^)  = 0.  0085536  and  S"/(n  - 1)  = 1.  Note  that  S"  is  not 

T1  T1  V 

a random  variable;  consequently,  S"  cannot  be  used  for  testing  the 

physical  hypothesis  a = 16.  However,  this  hypothesis  can  still  be 

tested  by  setting  (a-l6)^/s^=  1.456  = F(1,  n-  1,  p)  and  we  find  p = 0.30. 

sl 

This  is  the  probability  of  observing  a sample  departing  from  a = 16 
more  than  the  sample  in  Table  8.  3 by  random  sampling  from  populations 
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with  both  and  normally  distributed.  Since  this  value  of  p is 
larger  than  0,  05,  we  conclude  that  the  sample  in  Table  8.  3 does  not 
necessarily  provide  evidence  for  rejecting  the  theory.  However,  in 
this  case,  since  we  have  statistical  evidence  for  the  presence  of 
random  "systematic**  errors,  these  errors  (estimated  by  Y.  - 16) 
should  certainly  be  thoroughly  investigated  before  the  theory  is 
accepted. 


8.4  Example  with  Random  "Systematic"  Errors 
and  Unequal  Population  Variances 

The  model  and  the  analysis  in  this  case  is  essentially  the 
same  as  in  the  preceding  subsection,  although  differing  slightly  in 
detail. 


Table  8.4  is  representative  of  data  which  might  be  ob- 
tained in  practice  with  the  population  parameters:  a = 17, 

= %2  = 1.  %3  = %4  = %5  = 4.  m3  = = 5; 

and  0-^  = 9.  We  find  a = 18.  501,  S = 22.  3875  and  ? ^ = 0.  83881. 

V V 

In  this  case  the  revised  weights  will  vary  from  group  to  group: 


w."  = 1/(0-^+  s^.) 


(8.  32) 


Using  these  revised  weights  we  find  a"  = [w*!  Y.  ]/[w'I]  = 19.  097, 

2 t i i i 

and  S = [w.(Y.  - a")  ] = 34.4875.  Using  this  revised  estimate  of 

^ -.2  2 

S in  (8.  29)  we  obtain  cr  = 1.  3908  as  a revised  estimate  of  cr  . This 

V V 

revised  value  may  now  be  used  to  obtain  a second  revision  of  w1; 

this  second  revision  is  given  in  Table  8.  4 and  leads  to  a"  = 19.1^6 
and  S = 37.  4528  so  that  finally: 


Note  that  the  estimate  a"  involves  the  weights  w"  since 
we  wish  to  average  out  both  within-group  and  between-group  errors; 
on  the  other  hand,  S is  a ratio  of  between-group  and  within-group  variances 
and  is  therefore  defined  in  terms  of  the  original  w.. 


- 8.19  - 


Table  8.  4 


Y, 

lO 


2 2 

17;  0-  1 = 0- 
-nl  t]2 


= 1;  0- 


t]3 


2 2 
^r\4:  °^ri5 


t 

i = 1 

i = 2 

i = 3 

i = 4 

i = 5 

1 

18.23 

18.  14 

21.25 

22.  19 

20.  23 

2 

18.30 

17.  01 

20.  03 

17.67 

17.25 

3 

17.59 

18.41 

19.23 

22.  51 

19.39 

4 

18.  53 

17.22 

17.99 

21.  05 

18.  89 

5 

17.97 

16.89 

20.  79 

20.95 

22.  09 

6 

17.  64 

19.41 

7 

19.38 

18.  51 

8 

18.  38 

16.  59 

9 

19.55 

17.95 

10 

19.88 

17.  18 

Y. 

1 

18. 545 

17. 731 

19.858 

20.874 

19. 570 

1 

0 

+ V. 
1 

17.90 

17.  78 

20.  03 

20.  57 

18.  65 

2 

s . 

0.  63754 

0. 79954 

1.  67992 

3.  67768 

3. 16580 

2 

®Yi 

0.063754 

0. 079954 

0. 33598 

0. 735536 

0.63316 

w. 

1 

15.685 

12. 507 

2.9763 

1.  3596 

1.  5794 

Y.  - 
1 

• a 

+0. 044 

-0. 770 

1.  357 

2.  373 

1.069 

1 

0.  0304 

7.4154 

5.4807 

7.6561 

1.8049 

Pi 

0.  86 

0.  025 

0.080 

0.  051 

0.25 

Y.  - 
1 

- a 

1.  545 

0.  731 

2.858 

3.  874 

2.  570 

lO 

37.  440 

6. 6833 

24. 311 

20.399 

10.432 

pi 

lO 

<0. 0001 

0.  031 

0.  0079 

0.011 

0.  032 

w»! 

1 

0.68750 

0.  67992 

0. 57911 

0.47029 

0.49408 

Y.  . 
1 

. a" 

-0. 621 

-1.435 

-0.692 

1. 708 

0.404 

6 . 0488 

25.  7547 

1.4252 

3.9663 

0.2578 

P- 

0.  037 

0.  0047 

0.  30 

0.  12 

0.  64 
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a = 19.17  ± 0.52  (8.  33) 

-2 

Using  tMs  second  revision  of  S leads  to  cr  =1,  53,  In  this  case 

V 

S/(n  - 1)  is  distributed  approximately  as  F(v  , v ) with  v =(n-l)  = 4 

2 ^ j±  1 Z 1 

as  before,  but  with  v = [s  ]/[s  . /(m.  - 1)]  = 6.9787,  and  we  obtain 

Z jlX  XX  X 2 

by  (8.  31)  the  following  90%  fiducial  interval  for  cr^: 

0.44  < <r^  <9.4  (8.  34) 

V 

2 

Again  the  population  value  or  =9  happens  to  lie  within  the  90% 
fiducial  interval,  but  in  10%  of  the  cases  of  random  sampling  it 
would  not  be  expected  to  do  so. 

We  may  also  obtain  the  following  approximate  confidence 
intervals  for  a:  Y*(50%)  = 18.  78  to  19.  56;  Y«=(95%)  = 17.  73  to  20.  61 
and  Y*(99.  5%)  = 16.  26  to  22.  08. 

It  is  of  interest  to  note  that  the  estimate  a = 18.  501  obtained 
with  the  weights  w.  happens  in  this  case  to  be  nearer  the  population 
value  a = 17  than  tAe  more  properly  weighted  estimate  a”  = 19.166. 

This  will  occur  occasionally  because  of  sampling  fluctuations; 
nevertheless,  the  use  of  the  procedure  described  above  is  recommended 
since  it  will  yield  better  results  on  the  average  for  a large  number  of 
samples. 


This  completes  the  discussion  of  the  one  variable  problem. 
All  of  the  above  methods  of  analysis  have  their  counterparts  in 
least  squares  analyses  involving  several  random  variables  and,  in 
some  respects,  the  methods  of  handling  these  more  general  problems 
are  the  same  as  for  one  variable.  Thus  the  random  systematic  errors 
in  the  multivariate  problem  are  estimated  in  much  the  same  way; 
unfortunately,  however,  the  allocation  of  the  several  components  of 
these  random  systematic  errors  to  the  corresponding  random  variables 
by  this  method  is  necessarily  somewhat  arbitrary,  and  this  is  one  of 
the  principal  difficulties  of  extending  the  method  of  least  squares  as 
formulated  in  Sections  2,  3 and  4 to  include  the  effects  of  random 
systematic  errors.  In  the  particular  case  of  fitting  a straight  line  to 
two  random  variables,  Wald’s  method,  as  generalized  in  Section  11, 
eliminates  some  of  this  arbitrariness  of  including  the  effects  of  random 
systematic  errors,  but  no  method  is  presently  available  which  leads  to 
a completely  unambiguous  solution  in  all  cases. 


CONCLUSION 


It  is  unfortunate  that  the  unique  solution  to  the  problem  of 
fitting  a series  of  points  to  a specified  functional  relation  is  so 
complex  in  the  general  case  where  more  than  one  of  the  observed 
variates  determining  these  points  is  subject  to  error.  However, 
no  short  cuts  have  been  found  to  a correct  statistical  understanding 
of  experimental  data  obtained  under  these  rather  typical  conditions. 
It  is  hoped  that  the  methods  presented  herein  are  in  sufficiently 
usable  form  that  they  will  be  employed  by  experimenters  wishing 
to  obtain  consistent  conclusions  from  their  analyses  of  experimental 
data. 


In  those  cases  where  the  experiments  are  still  in  the  plan- 
ning stage,  use  may  often  be  made  of  the  methods  presented  in  this 
paper  to  design  the  experiments  in  such  a way  that  repeated  obser- 
vations of  the  coordinates  of  each  point  become  available;  in  this 
way  more  nearly  optimum  use  may  be  made  of  statistical  theory 
in  the  analysis  of  the  resulting  experimental  data. 
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APPENDIX  I 

Maximum  Likelihood  Estimation  of  the  3n  + 2 

2 2 

Parameters  X.  , Y.  , cr  p,  and  C for  k = 2 

lO  lO  T|1 

Normally  Distributed  Variables 

A statistical  model  often  encountered  in  least  squares  fitting 

when  only  two  of  the  variables  are  random,  involves  n bivariate 

distributions  with  2n  different  means  and  2n  different  variances, 

2 

but  with  the  same  correlation  coefficient  p and  the  same  ratio  C 

2 2 2 

between  the  2n  variances:  cr  . = C cr  ..  If  the  two  random  variables 

CL  T|1 

are  both  normally  distributed,  we  may  use  the  method  of  maximum 
likelihood  to  obtain  consistent  and  asymptotically  efficient  estimates 
of  the  3n  + 2 parameters  defining  these  n bivariate  distributions. 


L = [i.] 


(I-l) 


m^ 


i. 

1 


t = l 


r 

j 

] 


1 

2 ' 

ZttCct  . sT.  2~ 

T|1  1 - p 


it 


2 2 
20-  .(1  - p ) 
r]i 


(1-2) 


(X  - X.  ) 

it  lO 


2p(X.^  - X.  )(Y.^  - Y.  ) 

it  lO  it  lO 


+ (Y. 


it 


(1-3) 


The  maximum  likelihood  estimates  X.  , Y.  , of  the  2n  mean  values 

lO  lO 

are  determined  by  solving  simultaneously  the  following  2n  equations: 


9L 


ax. 

lO 


ai. 

1 


ax. 

lO 


(1  - 


2 2 
T|1 


m. 

1 

V 

/ 

c. 

t = i 


(X.^  - X.  ) 

it  lO 


= 0 


(1-4) 


9L 

9Y 


ai. 

1 

9Y. 

lO 


m. 

^ Y I' 

(1  - P )o-  . t = l' 

Til 


(Y 


it 


- Y.  ) - 

lO 


1-2 


m. 


X.  = 

lO 


m. 

1 


X.^  = X. 
it  1 


(1-6) 


t = l 


mi 


io  m. 
1 


Y._  = Y, 

it  1 


(1-7) 


t = l 


The  maximum  likelihood  estimates  cr  p»  and  C of  the  remaining 
n + 2 parameters  are  determined  by  solving  simultaneously  the  fol- 
lowing n + 2 equations: 


m. 

9cr  . ^ ^ O'  . 


8L 

.2  2 

90-  . 9(T  . ^ , 

r\i  T|i  t = 1 


f. 

It 


•5/1 

2(1  - p )(T 
T|1  Tjl 


= 0 


(1-8) 


mi 


9p 


-2 


(1  - P ) 


P - 


it 


^2  -.2 

(i-P 


Til 


= 0 (1-9) 


m. 

1 


9L 

9C 


.K-t  * 


(1  - p^)  ^ J 

T|1 


p-(Xjt  - X,)(Y.^  - Y.),, 

(1  - p^} 

^ qi 


= 0(1-10) 


In  the  above  equations  f^^denotes  the  value  of  f.^with  the  parameters 
replaced  by  their  maximum  likelihood  estimates.  Using  the  sample 
values  defined  by  (2.  2)  ajid  (2.  4),  the  above  n + 2 equations  may  be 
expressed: 


^2 

O'  . = 
TP 


(m,  - 1)  _ s . 2p  r.  s . s . - 

1 r €L  '^1€1T|1.2 


2m^(l  - p ^) 


£2 


+ S 


T]1 


(I-ll) 


P = 


[mi]C 


r (m.  - 1)  r . s . s . 

1 1 €1  T|1 

^2 

cr  . 

T]1 


(1.12) 
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(m.  - 1)  s^. 
1 €1 


-.2 

cr  . 
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It  does  not  appear  to  be  useful  to  further  separate  the  variables  in 

the  above  equations  since  they  may  be  solved  directly  in  their  present 

form  by  numerical  methods  involving  an  iterative  process.  For 
• ^ ^ 

example,  estimates  of  p and  C may  be  obtained  by  setting 
-.2  2 

(T  . S s .(m.  - l)/m.  so  that: 

Til  T)1  1 1 


(1-13) 


r 

L 


2 

m.  s . 

1 a 


(1-14) 


est 


[m.]  C 
1 est 


m.  r.  s . 
1 1 a 


(1-15) 


Using  these  estimated  values  in  (I-ll),  revised  estimates  of  o'  . may 

be  obtained;  these  may  then  be  substituted  in  (1-12)  and  (1-13)  to  obtain 

better  estimates  of  p and  C.  The  above  process  could  be  repeated, 

if  better  accuracy  were  required,  but  this  would  presumably  not  often 

•—•2 

be  the  case  in  practice  since  the  values  o'  p and  C are  themselves 
only  estimates  of  the  population  values  of  tkese  parameters,  and 
these  estimated  values  are  required  in  our  least  squares  solution 
only  in  the  determination  of  the  relative  weights. 


Fortunately,  a case  often  encountered  in  our  least  squares 
application  is  the  one  in  which  the  magnitudes  of  p and  C are  known 
a priori  and  thus  need  not  be  estimated  by  the  method  of  maximum 
likelihood  from  the  data.  In  this  special  case  the  n maximum  likeli 

hood  estimates  of  cr  , may  be  expressed: 


(m.  - 1) 
2m.(l  - p^) 


i T|i 
C 


(1-16) 


1-4 


If  we  determine  the  expected  value  of  (1-16)  we  find  that 

(m.  "1) 


Thus  the  bias  of  the  maximum  likelihood 


E(?^.)  = — <T~.. 

T|i  m.  r\i 

estimate  a . as  given  by  (1-16)  may  be  removed  by  multiplying  by 

the  factor  m. /(m.  - 1).  When  p and  C are  unknown,  the  bias  of  c , 

11  T|1 

(as  given  by  (I-ll))  is  not  as  readily  determined  but,  in  the  absence 
of  better  information,  the  same  factor  may  be  used: 
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